### A Hybrid data dependent dissimilarity measure for image retrieval

**Authors:**Shojanazeri, Hamid , Teng, Shyh , Lu, Guojun**Date:**2017**Type:**Text , Unpublished work**Full Text:****Description:**Abstract— In image retrieval, an effective dissimilarity measure is required to retrieve the perceptually similar images. Minkowski-type (lp ) distance is widely used for image retrieval, however it has its limitations. It focuses on distance between image features and ignores the data distribution of the image features, which can play an important role in measuring perceptual similarity of images. !! also favours the most dominant components in calculating the total dissimilarity. A data dependent measure, named !! -dissimilarity, which estimates the dissimilarity using the data distribution, has been proposed recently. Rather than relying on geometric distance, it measures the dissimilarity between two instances in each dimension as a probability mass in a region that encloses the two instances. It considers two instances in a sparse region to be more similar than in a dense region. Using the probability of data mass enables all the dimensions of feature vectors to contribute in the final estimate of dissimilarity, so it does not just heavily bias towards the most dominant components. However, relying only on data distribution and completely ignoring the geometric distance raise another limitation. This can result in finding two instances similar only due to being in a sparse region, however if the geometric distance between them is large then they are not perceptually similar. To address this limitation we proposed a new hybrid data dependent dissimilarity (HDDD) measure that considers both data distribution as well as geometric distance. Our experimental results using Corel database and Caltech 101 show that (HDDD) leads to higher image retrieval performance than lp distance (lpD) and mp.

### A new image dissimilarity measure incorporating human perception

**Authors:**Shojanazeri, Hamid , Teng, Shyh , Aryal, Sunil , Zhang, Dengsheng , Lu, Guojun**Date:**2018**Type:**Text , Unpublished work**Full Text:****Description:**Pairwise (dis) similarity measure of data objects is central to many applications of image anlaytics, such as image retrieval and classification. Geometric distance, particularly Euclidean distance ((

### A new perceptual dissimilarity measure for image retrieval and clustering

**Authors:**Shojanazeri, Hamid**Date:**2018**Type:**Text , Thesis , PhD**Full Text:****Description:**Image retrieval and clustering are two important tools for analysing and organising images. Dissimilarity measure is central to both image retrieval and clustering. The performance of image retrieval and clustering algorithms depends on the effectiveness of the dissimilarity measure. ‘Minkowski’ distance, or more specifically, ‘Euclidean’ distance, is the most widely used dissimilarity measure in image retrieval and clustering. Euclidean distance depends only on the geometric position of two data instances in the feature space and completely ignores the data distribution. However, data distribution has an effect on human perception. The argument that two data instances in a dense area are more perceptually dissimilar than the same two instances in a sparser area, is proposed by psychologists. Based on this idea, a dissimilarity measure called, ‘mp’, has been proposed to address Euclidean distance’s limitation of ignoring the data distribution. Here, mp relies on data distribution to calculate the dissimilarity between two instances. As prescribed in mp, higher data mass between two data instances implies higher dissimilarity, and vice versa. mp relies only on data distribution and completely ignores the geometric distance in its calculations. In the aggregation of dissimilarities between two instances over all the dimensions in feature space, both Euclidean distance and mp give same priority to all the dimensions. This may result in a situation that the final dissimilarity between two data instances is determined by a few dimensions of feature vectors with relatively much higher values. As a result, the dissimilarity derived may not align well with human perception. The need to address the limitations of Minkowski distance measures, along with the importance of a dissimilarity measure that considers both geometric distance and the perceptual effect of data distribution in measuring dissimilarity between images motivated this thesis. It studies the performance of mp for image retrieval. It investigates a new dissimilarity measure that combines both Euclidean distance and data distribution. In addition to these, it studies the performance of such a dissimilarity measure for image retrieval and clustering. Our performance study of mp for image retrieval shows that relying only on data distribution to measure the dissimilarity results in some situations, where the mp’s measurement is contrary to human perception. This thesis introduces a new dissimilarity measure called, perceptual dissimilarity measure (PDM). PDM considers the perceptual effect of data distribution in combination with Euclidean distance. PDM has two variants, PDM1 and PDM2. PDM1 focuses on improving mp by weighting it using Euclidean distance in situations where mp may not retrieve accurate results. PDM2 considers the effect of data distribution on the perceived dissimilarity measured by Euclidean distance. PDM2 proposes a weighting system for Euclidean distance using a logarithmic transform of data mass. The proposed PDM variants have been used as alternatives to Euclidean distance and mp to improve the accuracy in image retrieval. Our results show that PDM2 has consistently performed the best, compared to Euclidean distance, mp and PDM1. PDM1’s performance was not consistent, although it has performed better than mp in all the experiments, but it could not outperform Euclidean distance in some cases. Following the promising results of PDM2 in image retrieval, we have studied its performance for image clustering. k-means is the most widely used clustering algorithm in scientific and industrial applications. k-medoids is the closest clustering algorithm to k-means. Unlike k-means which works only with Euclidean distance, k-medoids gives the option to choose the arbitrary dissimilarity measure. We have used Euclidean distance, mp and PDM2 as the dissimilarity measure in k-medoids and compared the results with k-means. Our clustering results show that PDM2 has perfromed overally the best. This confirms our retrieval results and identifies PDM2 as a suitable dissimilarity measure for image retrieval and clustering.**Description:**Doctor of Philosophy