Efficient anomaly detection by isolation using Nearest Neighbour Ensemble
- Authors: Bandaragoda, Tharindu , Ting, Kaiming , Albrecht, David , Liu, Fei , Wells, Jonathan
- Date: 2014
- Type: Text , Conference paper
- Relation: 14th IEEE International Conference on Data Mining Workshop (ICDMW 2014); Shenzhen, China; 14th December 2014 p. 698-705
- Full Text: false
- Reviewed:
- Description: This paper presents iNNE (isolation using Nearest Neighbour Ensemble), an efficient nearest neighbour-based anomaly detection method by isolation. Inne runs significantly faster than existing nearest neighbour-based methods such as Local Outlier Factor, especially in data sets having thousands of dimensions or millions of instances. This is because the proposed method has linear time complexity and constant space complexity. Compared with the existing tree-based isolation method iForest, the proposed isolation method overcomes three weaknesses of iForest that we have identified, i.e., Its inability to detect local anomalies, anomalies with a low number of relevant attributes, and anomalies that are surrounded by normal instances.
Improving iForest with relative mass
- Authors: Aryal, Sunil , Ting, Kaiming , Wells, Jonathan , Washio, Takashi
- Date: 2014
- Type: Text , Conference paper
- Relation: 18th Pacific-Asia Conference, PAKDD 2014: Advances in Knowledge Discovery and Data Mining; Tainan, Taiwan; 13th-16th May 2014; published in Lecture Notes in Artificial Intelligence (subseries of Lecture Notes in Computer Science) Vol. 8444, p. 510-521
- Full Text: false
- Reviewed:
- Description: iForest uses a collection of isolation trees to detect anomalies. While it is effective in detecting global anomalies, it fails to detect local anomalies in data sets having multiple clusters of normal instances because the local anomalies are masked by normal clusters of similar density and they become less susceptible to isolation. In this paper, we propose a very simple but effective solution to overcome this limitation by replacing the global ranking measure based on path length with a local ranking measure based on relative mass that takes local data distribution into consideration. We demonstrate the utility of relative mass by improving the task specific performance of iForest in anomaly detection and information retrieval tasks.