This paper introduces a new unsupervised anomaly detector called ZERO++ which employs the number of zero appearances in subspaces to detect anomalies in categorical data. It is unique in that it works in regions of subspaces that are not occupied by data; whereas existing methods work in regions occupied by data. ZERO++ examines only a small number of low dimensional subspaces to successfully identify anomalies. Unlike existing frequency-based algorithms, ZERO++ does not involve subspace pattern searching. We show that ZERO++ is better than or comparable with the state-of-the-art anomaly detection methods over a wide range of real-world categorical and numeric data sets; and it is efficient with linear time complexity and constant space complexity which make it a suitable candidate for large-scale data sets.
This paper presents iNNE (isolation using Nearest Neighbour Ensemble), an efficient nearest neighbour-based anomaly detection method by isolation. Inne runs significantly faster than existing nearest neighbour-based methods such as Local Outlier Factor, especially in data sets having thousands of dimensions or millions of instances. This is because the proposed method has linear time complexity and constant space complexity. Compared with the existing tree-based isolation method iForest, the proposed isolation method overcomes three weaknesses of iForest that we have identified, i.e., Its inability to detect local anomalies, anomalies with a low number of relevant attributes, and anomalies that are surrounded by normal instances.