- Title
- A new dimensionality-unbiased score for efficient and effective outlying aspect mining
- Creator
- Samariya, Durgesh; Ma, Jiangang
- Date
- 2022
- Type
- Text; Journal article
- Identifier
- http://researchonline.federation.edu.au/vital/access/HandleResolver/1959.17/188991
- Identifier
- vital:17388
- Identifier
-
https://doi.org/10.1007/s41019-022-00185-5
- Identifier
- ISSN:2364-1185 (ISSN)
- Abstract
- The main aim of the outlying aspect mining algorithm is to automatically detect the subspace(s) (a.k.a. aspect(s)), where a given data point is dramatically different than the rest of the data in each of those subspace(s) (aspect(s)). To rank the subspaces for a given data point, a scoring measure is required to compute the outlying degree of the given data in each subspace. In this paper, we introduce a new measure to compute outlying degree, called Simple Isolation score using Nearest Neighbor Ensemble (SiNNE), which not only detects the outliers but also provides an explanation on why the selected point is an outlier. SiNNE is a dimensionally unbias measure in its raw form, which means the scores produced by SiNNE are compared directly with subspaces having different dimensions. Thus, it does not require any normalization to make the score unbiased. Our experimental results on synthetic and publicly available real-world datasets revealed that (i) SiNNE produces better or at least the same results as existing scores. (ii) It improves the run time of the existing outlying aspect mining algorithm based on beam search by at least two orders of magnitude. SiNNE allows the existing outlying aspect mining algorithm to run in datasets with hundreds of thousands of instances and thousands of dimensions which was not possible before. © 2022, The Author(s).
- Publisher
- Springer
- Relation
- Data Science and Engineering Vol. 7, no. 2 (2022), p. 120-135
- Rights
- All metadata describing materials held in, or linked to, the repository is freely available under a CC0 licence
- Rights
- http://creativecommons.org/licenses/by/4.0/
- Rights
- Copyright © 2022, The Author(s)
- Rights
- Open Access
- Subject
- 4605 Data management and data science; Isolation based; Outlying aspect mining; Outlying degree; Subspace search
- Full Text
- Reviewed
- Funder
- The preliminary version of this paper is published in Proceedings of the 21st International Conference on Web Information Systems Engineering (WISE) 2020 []. This work is supported by Federation University Research Priority Area (RPA) scholarship, awarded to Durgesh Samariya.
- Hits: 1545
- Visitors: 436
- Downloads: 56
Thumbnail | File | Description | Size | Format | |||
---|---|---|---|---|---|---|---|
View Details Download | SOURCE1 | Published version | 3 MB | Adobe Acrobat PDF | View Details Download |