Simple supervised dissimilarity measure : bolstering iForest-induced similarity with class information without learning
- Authors: Wells, Jonathan , Aryal, Sunil , Ting, Kai
- Date: 2020
- Type: Text , Journal article
- Relation: Knowledge and Information Systems Vol. 62, no. 8 (2020), p. 3203-3216
- Full Text: false
- Reviewed:
- Description: Existing distance metric learning methods require optimisation to learn a feature space to transform data—this makes them computationally expensive in large datasets. In classification tasks, they make use of class information to learn an appropriate feature space. In this paper, we present a simple supervised dissimilarity measure which does not require learning or optimisation. It uses class information to measure dissimilarity of two data instances in the input space directly. It is a supervised version of an existing data-dependent dissimilarity measure called me. Our empirical results in k-NN and LVQ classification tasks show that the proposed simple supervised dissimilarity measure generally produces predictive accuracy better than or at least as good as existing state-of-the-art supervised and unsupervised dissimilarity measures. © 2020, Springer-Verlag London Ltd., part of Springer Nature.
Nearest-neighbour-induced isolation similarity and its impact on density-based clustering
- Authors: Qin, Xiaoyu , Ting, Kai , Zhu, Ye , Lee, Vincent
- Date: 2019
- Type: Text , Conference paper
- Relation: 33rd AAAI Conference on Artificial Intelligence, AAAI 2019, 31st Annual Conference on Innovative Applications of Artificial Intelligence, IAAI 2019 and the 9th AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019, 27 January to 1 February 2019. p. 4755-4762
- Full Text:
- Reviewed:
- Description: A recent proposal of data dependent similarity called Isolation Kernel/Similarity has enabled SVM to produce better classification accuracy. We identify shortcomings of using a tree method to implement Isolation Similarity; and propose a nearest neighbour method instead. We formally prove the characteristic of Isolation Similarity with the use of the proposed method. The impact of Isolation Similarity on density-based clustering is studied here. We show for the first time that the clustering performance of the classic density-based clustering algorithm DBSCAN can be significantly uplifted to surpass that of the recent density-peak clustering algorithm DP. This is achieved by simply replacing the distance measure with the proposed nearest-neighbour-induced Isolation Similarity in DBSCAN, leaving the rest of the procedure unchanged. A new type of clusters called mass-connected clusters is formally defined. We show that DBSCAN, which detects density-connected clusters, becomes one which detects mass-connected clusters, when the distance measure is replaced with the proposed similarity. We also provide the condition under which mass-connected clusters can be detected, while density-connected clusters cannot. © 2019, Association for the Advancement of Artificial Intelligence (www.aaai.org).
A new effective and efficient measure for outlying aspect mining
- Authors: Samariya, Durgesh , Aryal, Sunil , Ting, Kai , Ma, Jiangang
- Date: 2020
- Type: Text , Conference paper
- Relation: 21st International Conference on Web Information Systems Engineering, WISE 2020, Amsterdam. 20-24 October 2020, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics Vol. 12343 LNCS, p. 463-474
- Full Text: false
- Reviewed:
- Description: Outlying Aspect Mining (OAM) aims to find the subspaces (a.k.a. aspects) in which a given query is an outlier with respect to a given data set. Existing OAM algorithms use traditional distance/density-based outlier scores to rank subspaces. Because these distance/density-based scores depend on the dimensionality of subspaces, they cannot be compared directly between subspaces of different dimensionality. Z-score normalisation has been used to make them comparable. It requires to compute outlier scores of all instances in each subspace. This adds significant computational overhead on top of already expensive density estimation—making OAM algorithms infeasible to run in large and/or high-dimensional datasets. We also discover that Z-score normalisation is inappropriate for OAM in some cases. In this paper, we introduce a new score called Simple Isolation score using Nearest Neighbor Ensemble (SiNNE), which is independent of the dimensionality of subspaces. This enables the scores in subspaces with different dimensionalities to be compared directly without any additional normalisation. Our experimental results revealed that SiNNE produces better or at least the same results as existing scores; and it significantly improves the runtime of an existing OAM algorithm based on beam search. © 2020, Springer Nature Switzerland AG.