- Title
- Mp-dissimilarity : A data dependent dissimilarity measure
- Creator
- Aryal, Sunil; Ting, Kaiming; Haffari, Gholamreza; Washio, Takashi
- Date
- 2014
- Type
- Text; Conference paper
- Identifier
- http://researchonline.federation.edu.au/vital/access/HandleResolver/1959.17/160058
- Identifier
- vital:12084
- Identifier
-
https://doi.org/10.1109/ICDM.2014.33
- Identifier
- ISBN:1550-4786 (ISSN); 978-1-4799-4302-9 (ISBN)
- Abstract
- Nearest neighbour search is a core process in many data mining algorithms. Finding reliable closest matches of a query in a high dimensional space is still a challenging task. This is because the effectiveness of many dissimilarity measures, that are based on a geometric model, such as lp-norm, decreases as the number of dimensions increases. In this paper, we examine how the data distribution can be exploited to measure dissimilarity between two instances and propose a new data dependent dissimilarity measure called 'mp-dissimilarity'. Rather than relying on geometric distance, it measures the dissimilarity between two instances in each dimension as a probability mass in a region that encloses the two instances. It deems the two instances in a sparse region to be more similar than two instances in a dense region, though these two pairs of instances have the same geometric distance. Our empirical results show that the proposed dissimilarity measure indeed provides a reliable nearest neighbour search in high dimensional spaces, particularly in sparse data. Mp-dissimilarity produced better task specific performance than lp-norm and cosine distance in classification and information retrieval tasks.
- Publisher
- Conference Publishing Services (CPS)
- Relation
- 14th IEEE International Conference on Data Mining (2014 ICDM); Shenzhen, China; 14th-17th December 2014 p. 707-712
- Rights
- Copyright © 2014 by The Institute of Electrical and Electronics Engineers, Inc.
- Rights
- This metadata is freely available under a CCO license
- Subject
- Accuracy; Information Retrieval; Vectors; Educational Institutions; Approximation Methods; Data Mining; Electronic Mail; Mp-Dissimilarity
- Reviewed
- Hits: 1837
- Visitors: 1731
- Downloads: 3
Thumbnail | File | Description | Size | Format |
---|