Commentary : A decomposition of the outlier detection problem into a set of supervised learning problems
- Authors: Zhu, Ye , Ting, Kaiming
- Date: 2016
- Type: Text , Journal article
- Relation: Machine Learning Vol. 105, no. 2 (2016), p. 301-304
- Full Text: false
- Reviewed:
- Description: This article discusses the material in relation to iForest (Liu et al. in ACM Trans Knowl Discov Data 6(1):3, 2012) reported in a recent Machine Learning Journal paper by Paulheim and Meusel (Mach Learn 100(2–3):509–531, 2015). It presents an empirical comparison result of iForest using the default parameter settings suggested by its creator (Liu et al. 2012) and iForest using the settings employed by Paulheim and Meusel (2015). This comparison has an impact on the conclusion made by Paulheim and Meusel (2015). © 2016, The Author(s).
Isolation-based anomaly detection using nearest-neighbor ensembles
- Authors: Bandaragoda, Tharindu , Ting, Kaiming , Albrecht, David , Liu, Fei , Zhu, Ye , Wells, Jonathan
- Date: 2018
- Type: Text , Journal article
- Relation: Computational Intelligence Vol. 34, no. 4 (2018), p. 968-998
- Full Text: false
- Reviewed:
- Description: The first successful isolation-based anomaly detector, ie, iForest, uses trees as a means to perform isolation. Although it has been shown to have advantages over existing anomaly detectors, we have identified 4 weaknesses, ie, its inability to detect local anomalies, anomalies with a high percentage of irrelevant attributes, anomalies that are masked by axis-parallel clusters, and anomalies in multimodal data sets. To overcome these weaknesses, this paper shows that an alternative isolation mechanism is required and thus presents iNNE or isolation using Nearest Neighbor Ensemble. Although relying on nearest neighbors, iNNE runs significantly faster than the existing nearest neighbor–based methods such as the local outlier factor, especially in data sets having thousands of dimensions or millions of instances. This is because the proposed method has linear time complexity and constant space complexity. © 2018 Wiley Periodicals, Inc.
Detecting outlier patterns with query-based artificially generated searching conditions
- Authors: Yu, Shuo , Xia, Feng , Sun, Yuchen , Tang, Tao , Yan, Xiaoran , Lee, Ivan
- Date: 2021
- Type: Text , Journal article
- Relation: IEEE Transactions on Computational Social Systems Vol. 8, no. 1 (2021), p. 134-147
- Full Text:
- Reviewed:
- Description: In the age of social computing, finding interesting network patterns or motifs is significant and critical for various areas, such as decision intelligence, intrusion detection, medical diagnosis, social network analysis, fake news identification, and national security. However, subgraph matching remains a computationally challenging problem, let alone identifying special motifs among them. This is especially the case in large heterogeneous real-world networks. In this article, we propose an efficient solution for discovering and ranking human behavior patterns based on network motifs by exploring a user's query in an intelligent way. Our method takes advantage of the semantics provided by a user's query, which in turn provides the mathematical constraint that is crucial for faster detection. We propose an approach to generate query conditions based on the user's query. In particular, we use meta paths between the nodes to define target patterns as well as their similarities, leading to efficient motif discovery and ranking at the same time. The proposed method is examined in a real-world academic network using different similarity measures between the nodes. The experiment result demonstrates that our method can identify interesting motifs and is robust to the choice of similarity measures. © 2014 IEEE.
Cyberattack triage using incremental clustering for intrusion detection systems
- Authors: Taheri, Sona , Bagirov, Adil , Gondal, Iqbal , Brown, Simon
- Date: 2020
- Type: Text , Journal article
- Relation: International Journal of Information Security Vol. 19, no. 5 (2020), p. 597-607
- Relation: http://purl.org/au-research/grants/arc/DP190100580
- Full Text:
- Reviewed:
- Description: Intrusion detection systems (IDSs) are devices or software applications that monitor networks or systems for malicious activities and signals alerts/alarms when such activity is discovered. However, an IDS may generate many false alerts which affect its accuracy. In this paper, we develop a cyberattack triage algorithm to detect these alerts (so-called outliers). The proposed algorithm is designed using the clustering, optimization and distance-based approaches. An optimization-based incremental clustering algorithm is proposed to find clusters of different types of cyberattacks. Using a special procedure, a set of clusters is divided into two subsets: normal and stable clusters. Then, outliers are found among stable clusters using an average distance between centroids of normal clusters. The proposed algorithm is evaluated using the well-known IDS data sets—Knowledge Discovery and Data mining Cup 1999 and UNSW-NB15—and compared with some other existing algorithms. Results show that the proposed algorithm has a high detection accuracy and its false negative rate is very low. © 2019, Springer-Verlag GmbH Germany, part of Springer Nature.
- Description: This research was conducted in Internet Commerce Security Laboratory (ICSL) funded by Westpac Banking Corporation Australia. In addition, the research by Dr. Sona Taheri and A/Prof. Adil Bagirov was supported by the Australian Government through the Australian Research Council’s Discovery Projects funding scheme (DP190100580).
Mining outlying aspects on healthcare data
- Authors: Samariya, Durgesh , Ma, Jiangang
- Date: 2021
- Type: Text , Conference paper
- Relation: 10th International Conference on Health Information Science, HIS 2021, Melbourne, 25-28 October 2021, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) Vol. 13079 LNCS, p. 160-170
- Full Text: false
- Reviewed:
- Description: Machine learning and artificial intelligence have a wide range of applications in medical domain, such as detecting anomalous reading, anomalous patient health condition, etc. Many algorithms have been developed to solve this problem. However, they fail to answer why those entries are considered as an outlier. This research gap leads to outlying aspect mining problem. The problem of outlying aspect mining aims to discover the set of features (a.k.a subspace) in which the given data point is dramatically different than others. In this paper, we present an interesting application of outlying aspect mining in the medical domain. This paper aims to effectively and efficiently identify outlying aspects using different outlying aspect mining algorithms and evaluate their performance on different real-world healthcare datasets. The experimental results show that the latest isolation-based outlying aspect mining measure, SiNNE, have outstanding performance on this task and have promising results. © 2021, Springer Nature Switzerland AG.
A generative adversarial active learning method for effective outlier detection
- Authors: Bah, Mohamed , Zhang, Ji , Yu, Ting , Xia, Feng , Li, Zhao , Zhou, Shuigeng , Wang, Hongzhi
- Date: 2022
- Type: Text , Conference paper
- Relation: 34th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2022, Virtual, online, 31 October-2 November 2022, Proceedings - International Conference on Tools with Artificial Intelligence, ICTAI Vol. 2022-October, p. 131-139
- Full Text: false
- Reviewed:
- Description: Outlier detection is an important data mining task, and developing effective methods to detect outliers is challenging in cases where there is insufficient labeled data. Manually labeling the data is labor-intensive and time-consuming. Because of a limited number of labeled samples, the classes are unbalanced, resulting in a class-imbalance problem. Existing methods fail to address these aforementioned issues holistically and fall short in generating quality outlier samples for effective outlier detection accuracy. In this paper, we propose a new solution that tackles these problems. We propose a. Generative Adversarial Active Learning method (DIR-GAAL), which generates Diverse, Informative, and Representative outlier samples through active learning, and employs the mini-max game between the generator and discriminator in a generative adversarial network. We conducted extensive experiments on several benchmark datasets to evaluate the performance of our method. When compared to other benchmark methods, our method consistently demon-strates better outlier detection accuracy without being negatively affected by the class-imbalance problem. © 2022 IEEE.
An efficient pose estimation for limited-resourced MAVs using sufficient statistics
- Authors: Senthooran, Ilankaikone , Barca, Jan , Kamruzzaman, Joarder , Murhsed, Manzur , Chung, Hoam
- Date: 2015
- Type: Text , Conference paper
- Relation: IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2015; Hamburg; Germany; 28th September-2nd October 2015 Vol. 2015, p. 3735-3740
- Full Text: false
- Reviewed:
- Description: We present a computationally efficient RGB-D based pose estimation solution for less computationally resourced MAVs, which are ideally suited as members in a swarm. Our approach applies the sufficient statistics derived for a least-squares problem to our problem context. RANSAC-based outlier detection in aligning corresponding feature points is a time consuming operation in visual pose estimation. The additive nature of the used sufficient statistics significantly reduces the computation time of the RANSAC procedure since the pose estimation in each test loop can be computed by reusing previously computed sufficient statistics. This eliminates the need for recomputing estimates from scratch each time. A simpler hypotheses testing method gave similar performance in terms of speed but less accurate than our proposed method. We further increase the efficiency by reducing the problem size to four dimensions using attitude data from an Attitude and Heading Reference System (AHRS). Using a real-world dataset, we show that our algorithm saves up to 94% of computation time for the RANSAC-based procedure in pose estimation while improving the accuracy.