Isolation set-kernel and its application to multi-instance learning
- Authors: Xu, Bi-Cun , Ting, Kaiming , Zhou, Zhi-Hua
- Date: 2019
- Type: Text , Conference proceedings , Conference paper
- Relation: 25th ACM SIGKDD International Conferencce on Knowledge Discovery and Data Mining, KDD 2019; Anchorage, United States; 4th-8th August 2019 p. 941-949
- Full Text: false
- Reviewed:
- Description: Set-level problems are as important as instance-level problems. The core in solving set-level problems is: how to measure the similarity between two sets. This paper investigates data-dependent kernels that are derived directly from data. We introduce Isolation Set Kernel which is solely dependent on data distribution, requiring neither class information nor explicit learning. In contrast, most current set-similarities are not dependent on the underlying data distribution. We theoretically analyze the characteristic of Isolation Set-Kernel. As the set-kernel has a finite feature map, we show that it can be used to speed up the set-kernel computation significantly. We apply Isolation Set-Kernel to Multi-Instance Learning (MIL) using SVM classifier, and demonstrate that it outperforms other set-kernels or other solutions to the MIL problem.
Isolation kernel and its effect on SVM
- Authors: Ting, Kaiming , Zhu, Yue , Zhou, Zhi-Hua
- Date: 2018
- Type: Text , Conference proceedings , Conference paper
- Relation: 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2018; London, United Kingdom; 19th-23th August 2018 p. 2329-2337
- Full Text: false
- Reviewed:
- Description: This paper investigates data dependent kernels that are derived directly from data. This has been an outstanding issue for about two decades which hampered the development of kernel-based methods. We introduce Isolation Kernel which is solely dependent on data distribution, requiring neither class information nor explicit learning to be a classifier. In contrast, existing data dependent kernels rely heavily on class information and explicit learning to produce a classifier. We show that Isolation Kernel approximates well to a data independent kernel function called Laplacian kernel under uniform density distribution. With this revelation, Isolation Kernel can be viewed as a data dependent kernel that adapts a data independent kernel to the structure of a dataset. We also provide a reason why the proposed new data dependent kernel enables SVM (which employs a kernel through other means) to improve its predictive accuracy. The key differences between Random Forest kernel and Isolation Kernel are discussed to examine the reasons why the latter is a more successful tree-based kernel.
Neighbourhood contrast : A better means to detect clusters than density
- Authors: Chen, Bo , Ting, Kaiming
- Date: 2018
- Type: Text , Conference paper
- Relation: 22nd Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, PAKDD 2018; Melbourne, Australia; 3rd-6th June 2018; published in Lecutre Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) Vol. 10939 LNAI, p. 401-412
- Full Text: false
- Reviewed:
- Description: Most density-based clustering algorithms suffer from large density variations among clusters. This paper proposes a new measure called Neighbourhood Contrast (NC) as a better alternative to density in detecting clusters. The proposed NC admits all local density maxima, regardless of their densities, to have similar NC values. Due to this unique property, NC is a better means to detect clusters in a dataset with large density variations among clusters. We provide two applications of NC. First, replacing density with NC in the current state-of-the-art clustering procedure DP leads to significantly improved clustering performance. Second, we devise a new clustering algorithm called Neighbourhood Contrast Clustering (NCC) which does not require density or distance calculations, and therefore has a linear time complexity in terms of dataset size. Our empirical evaluation shows that both NC-based methods outperform density-based methods including the current state-of-the-art.
Discover multiple novel labels in multi-instance multi-label learning
- Authors: Zhu, Yue , Ting, Kaiming , Zhou, Zhi-Hua
- Date: 2017
- Type: Text , Conference paper
- Relation: Thirty-First AAAI Conference on Artificial Intelligence p. 2977-2984
- Full Text: false
- Reviewed:
- Description: Multi-instance multi-label learning (MIML) is a learning paradigm where an object is represented by a bag of instances and each bag is associated with multiple labels. Ordinary MIML setting assumes a fixed target label set. In real applications, multiple novel labels may exist outside this set, but hidden in the training data and unknown to the MIML learner. Existing MIML approaches are unable to discover the hidden novel labels, let alone predicting these labels in the previously unseen test data. In this paper, we propose the first approach to discover multiple novel labels in MIML problem using an efficient augmented lagrangian optimization, which has a bag-dependent loss term and a bag-independent clustering regularization term, enabling the known labels and multiple novel labels to be modeled simultaneously. The effectiveness of the proposed approach is validated in experiments.
New class adaptation via instance generation in one-pass class incremental learning
- Authors: Zhu, Yue , Ting, Kaiming , Zhou, Zhi-Hua
- Date: 2017
- Type: Text , Conference paper
- Relation: 2017 IEEE International Conference on Data Mining (ICDM)
- Full Text: false
- Reviewed:
- Description: One pass learning updates a model with only a single scan of the dataset, without storing historical data. Previous studies focus on classification tasks with a fixed class set, and will perform poorly in an open dynamic environment when new classes emerge in a data stream. The performance degrades because the classifier needs to receive a sufficient number of instances from new classes to establish a good model. This can take a long period of time. In order to reduce this period to deal with any-time prediction task, we introduce a framework to handle emerging new classes called One-Pass Class Incremental Learning (OPCIL). The central issue in OPCIL is: how to effectively adapt a classifier of existing classes to incorporate emerging new classes. We call it the new class adaptation issue, and propose a new approach to address it, which requires only one new class instance. The key is to generate pseudo instances which are optimized to satisfy properties that produce a good discriminative classifier. We provide the necessary propertiesand optimization procedures required to address this issue. Experiments validate the effectiveness of this approach.
Overcoming key weaknesses of distance-based neighbourhood methods using a data dependent dissimilarity measure
- Authors: Ting, Kaiming , Zhu, Ye , Carman, Mark , Zhu, Yue , Zhi-Hua, Zhou
- Date: 2016
- Type: Text , Conference paper
- Relation: 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco; August 13th-17th, 2016 p. 1205-1214
- Full Text: false
- Reviewed:
- Description: This paper introduces the first generic version of data dependent dissimilarity and shows that it provides a better closest match than distance measures for three existing algorithms in clustering, anomaly detection and multi-label classification. For each algorithm, we show that by simply replacing the distance measure with the data dependent dissimilarity measure, it overcomes a key weakness of the otherwise unchanged algorithm.
LeSiNN : Detecting anomalies by identifying least similar nearest neighbours
- Authors: Pang, Guansong , Ting, Kaiming , Albrecht, David
- Date: 2015
- Type: Text , Conference proceedings , Conference paper
- Relation: 15th IEEE International Conference on Data Mining Workshop, ICDMW 2015; Atlantic City, New Jersey; 14th-17th November 2015; published in Proceedings - 15th IEEE International Conference on Data Mining Workshop, ICDMW 2015
- Full Text: false
- Description: We introduce the concept of Least Similar Nearest Neighbours (LeSiNN) and use LeSiNN to detect anomalies directly. Although there is an existing method which is a special case of LeSiNN, this paper is the first to clearly articulate the underlying concept, as far as we know. LeSiNN is the first ensemble method which works well with models trained using samples of one instance. LeSiNN has linear time complexity with respect to data size and the number of dimensions, and it is one of the few anomaly detectors which can apply directly to both numeric and categorical data sets. Our extensive empirical evaluation shows that LeSiNN is either competitive to or better than six state-of-the-art anomaly detectors in terms of detection accuracy and runtime. © 2015 IEEE.
- Description: Proceedings - 15th IEEE International Conference on Data Mining Workshop, ICDMW 2015
Efficient anomaly detection by isolation using Nearest Neighbour Ensemble
- Authors: Bandaragoda, Tharindu , Ting, Kaiming , Albrecht, David , Liu, Fei , Wells, Jonathan
- Date: 2014
- Type: Text , Conference paper
- Relation: 14th IEEE International Conference on Data Mining Workshop (ICDMW 2014); Shenzhen, China; 14th December 2014 p. 698-705
- Full Text: false
- Reviewed:
- Description: This paper presents iNNE (isolation using Nearest Neighbour Ensemble), an efficient nearest neighbour-based anomaly detection method by isolation. Inne runs significantly faster than existing nearest neighbour-based methods such as Local Outlier Factor, especially in data sets having thousands of dimensions or millions of instances. This is because the proposed method has linear time complexity and constant space complexity. Compared with the existing tree-based isolation method iForest, the proposed isolation method overcomes three weaknesses of iForest that we have identified, i.e., Its inability to detect local anomalies, anomalies with a low number of relevant attributes, and anomalies that are surrounded by normal instances.
Improving iForest with relative mass
- Authors: Aryal, Sunil , Ting, Kaiming , Wells, Jonathan , Washio, Takashi
- Date: 2014
- Type: Text , Conference paper
- Relation: 18th Pacific-Asia Conference, PAKDD 2014: Advances in Knowledge Discovery and Data Mining; Tainan, Taiwan; 13th-16th May 2014; published in Lecture Notes in Artificial Intelligence (subseries of Lecture Notes in Computer Science) Vol. 8444, p. 510-521
- Full Text: false
- Reviewed:
- Description: iForest uses a collection of isolation trees to detect anomalies. While it is effective in detecting global anomalies, it fails to detect local anomalies in data sets having multiple clusters of normal instances because the local anomalies are masked by normal clusters of similar density and they become less susceptible to isolation. In this paper, we propose a very simple but effective solution to overcome this limitation by replacing the global ranking measure based on path length with a local ranking measure based on relative mass that takes local data distribution into consideration. We demonstrate the utility of relative mass by improving the task specific performance of iForest in anomaly detection and information retrieval tasks.
Mp-dissimilarity : A data dependent dissimilarity measure
- Authors: Aryal, Sunil , Ting, Kaiming , Haffari, Gholamreza , Washio, Takashi
- Date: 2014
- Type: Text , Conference paper
- Relation: 14th IEEE International Conference on Data Mining (2014 ICDM); Shenzhen, China; 14th-17th December 2014 p. 707-712
- Full Text: false
- Reviewed:
- Description: Nearest neighbour search is a core process in many data mining algorithms. Finding reliable closest matches of a query in a high dimensional space is still a challenging task. This is because the effectiveness of many dissimilarity measures, that are based on a geometric model, such as lp-norm, decreases as the number of dimensions increases. In this paper, we examine how the data distribution can be exploited to measure dissimilarity between two instances and propose a new data dependent dissimilarity measure called 'mp-dissimilarity'. Rather than relying on geometric distance, it measures the dissimilarity between two instances in each dimension as a probability mass in a region that encloses the two instances. It deems the two instances in a sparse region to be more similar than two instances in a dense region, though these two pairs of instances have the same geometric distance. Our empirical results show that the proposed dissimilarity measure indeed provides a reliable nearest neighbour search in high dimensional spaces, particularly in sparse data. Mp-dissimilarity produced better task specific performance than lp-norm and cosine distance in classification and information retrieval tasks.
MassBayes: a new generative classifier with multi-dimensional likelihood estimation
- Authors: Aryal, Sunil , Ting, Kaiming
- Date: 2013
- Type: Text , Conference paper
- Relation: Advances in Knowledge Discovery and Data Mining: 17th Pacific-Asia Conference p. 136-148
- Full Text: false
- Reviewed:
- Description: Existing generative classifiers (e.g., BayesNet and AnDE) make independence assumptions and estimate one-dimensional likelihood. This paper presents a new generative classifier called MassBayes that estimates multi-dimensional likelihood without making any explicit assumptions. It aggregates the multi-dimensional likelihoods estimated from random subsets of the training data using varying size random feature subsets. Our empirical evaluations show that MassBayes yields better classification accuracy than the existing generative classifiers in large data sets. As it works with fixed-size subsets of training data, it has constant training time complexity and constant space complexity, and it can easily scale up to very large data sets.
Optimizing cepstral features for audio classification
- Authors: Fu, Zhouyu , Lu, Guojun , Ting, Kaiming , Zhang, Dengsheng
- Date: 2013
- Type: Text , Conference paper
- Relation: International Joint Conference on Artificial Intelligence p. 1330-1336
- Full Text: false
- Reviewed:
- Description: Cepstral features have been widely used in audio applications. Domain knowledge has played an important role in designing different types of cepstral features proposed in the literature. In this paper, we present a novel approach for learning optimized cepstral features directly from audio data to better discriminate between different categories of signals in classification tasks. We employ multi-layer feedforward neural networks to model the cepstral feature extraction process. The network weights are initialized to replicate a reference cepstral feature like the mel frequency cepstral coefficient. We then propose a embedded approach that integrates feature learning with the training of a support vector machine (SVM) classifier. A single optimization problem is formulated where the feature and classifier variables are optimized simultaneously so as to refine the initial features and minimize the classification risk. Experimental results have demonstrated the effectiveness of the proposed feature learning approach, outperforming competing methods by a large margin on benchmark data.
A non-time series approach to vehicle related time series problems
- Authors: Wells, Jonathan , Ting, Kaiming , Naiwala, Chandrasiri
- Date: 2012
- Type: Text , Conference paper
- Relation: 10th Australasian Data Mining Conference (AusDM 2012); Sydney, Australia; 5th-7th December 2012; published in Conferences in Research and Practice in Information Technology (CRPIT) Vol. 134, p. 61-70
- Full Text:
- Reviewed:
- Description: This paper shows that some time series problems can be better served as non-time series problems. We used two unsupervised learning anomaly detectors to analyse a vehicle related time series problem and showed that non-time series treatment produced a better outcome than a time series treatment. We also present the benefits of using unsupervised methods over semi-supervised or supervised learning methods, and rule-based methods.
Learning sparse kernel classifiers in the primal
- Authors: Fu, Zhouyu , Lu, Guojun , Ting, Kaiming , Zhang, Dengsheng
- Date: 2012
- Type: Text , Conference paper
- Relation: Joint IAPR International Workshop, SSPR&SPR 2012; Hiroshima, Japan; 7th-9th November 2012; published in Structural, Syntactic, and Statistical Pattern Recognition (part of the Lecture Notes in Computer Science) Vol. 7626, p. 60-69
- Full Text: false
- Reviewed:
- Description: The increasing number of classification applications in large data sets demands that efficient classifiers be designed not only in training but also for prediction. In this paper, we address the problem of learning kernel classifiers with reduced complexity and improved efficiency for prediction in comparison to those trained by standard methods. A single optimisation problem is formulated for classifier learning which optimises both classifier weights and eXpansion Vectors (XVs) that define the classification function in a joint fashion. Unlike the existing approach of Wu et al, which performs optimisation in the dual formulation, our approach solves the primal problem directly. The primal problem is much more efficient to solve, as it can be converted to the training of a linear classifier in each iteration, which scales linearly to the size of the data set and the number of expansions. This makes our primal approach highly desirable for large-scale applications, where the dual approach is inadequate and prohibitively slow due to the solution of cubic-time kernel SVM involved in each iteration. Experimental results have demonstrated the efficiency and effectiveness of the proposed primal approach for learning sparse kernel classifiers that clearly outperform the alternatives.
Building sparse support vector machines for multi-instance classification
- Authors: Fu, Zhouyu , Lu, Guojun , Ting, Kaiming , Zhang, Dengsheng
- Date: 2011
- Type: Text , Conference paper
- Relation: European Conference on Machine Learning Knowledge Discovery in Databases (ECML PKDD) p. 471-486
- Full Text: false
- Reviewed:
- Description: We propose a direct approach to learning sparse Support Vector Machine (SVM) prediction models for Multi-Instance (MI) classification. The proposed sparse SVM is based on a “label-mean” formulation of MI classification which takes the average of predictions of individual instances for bag-level prediction. This leads to a convex optimization problem, which is essential for the tractability of the optimization problem arising from the sparse SVM formulation we derived subsequently, as well as the validity of the optimization strategy we employed to solve it. Based on the “label-mean” formulation, we can build sparse SVM models for MI classification and explicitly control their sparsities by enforcing the maximum number of expansions allowed in the prediction function. An effective optimization strategy is adopted to solve the formulated sparse learning problem which involves the learning of both the classifier and the expansion vectors. Experimental results on benchmark data sets have demonstrated that the proposed approach is effective in building very sparse SVM models while achieving comparable performance to the state-of-the-art MI classifiers.
Clustering gene expression data using ant-based heuristics
- Authors: Tan, Swee , Ting, Kaiming , Teng, Shyh
- Date: 2011
- Type: Text , Conference paper
- Relation: IEEE Congress on Evolutionary Computation (IEEE CEC) 2011 p. 1-8
- Full Text: false
- Reviewed:
- Description: ABSTRACT We consider the problem of finding the clusters in novel datasets in which the number of clusters is not known a priori; and little or no additional information is available for users to adjust the parameters in a clustering algorithm. We address this problem using a stochastic algorithm named SATTA (Simplified Adaptive Time Dependent Transporter), which attempts to find clusters without requiring users to specify the number of clusters or adjust any parameters. SATTA is then compared with Expectation Maximization Clustering, which is also able to estimate the number clusters using the principle of maximum likelihood and find the underlying clusters without any human interventions. Our results on seven gene expression datasets show that SATTA significantly outperforms Expectation Maximization Clustering in terms of clustering accuracy and efficiency. We discuss the conceptual differences between SATTA and EMC, which suggests that SATTA is a more promising alternative approach than Expectation Maximization Clustering when little or no additional information is available for clustering novel datasets.
- Description: ABSTRACT We consider the problem of finding the clusters in novel datasets in which the number of clusters is not known a priori; and little or no additional information is available for users to adjust the parameters in a clustering algorithm. We address this problem using a stochastic algorithm named SATTA (Simplified Adaptive Time Dependent Transporter), which attempts to find clusters without requiring users to specify the number of clusters or adjust any parameters. SATTA is then compared with Expectation Maximization Clustering, which is also able to estimate the number clusters using the principle of maximum likelihood and find the underlying clusters without any human interventions. Our results on seven gene expression datasets show that SATTA significantly outperforms Expectation Maximization Clustering in terms of clustering accuracy and efficiency. We discuss the conceptual differences between SATTA and EMC, which suggests that SATTA is a more promising alternative approach than Expectation Maximization Clustering when little or no additional information is available for clustering novel datasets. [less] 0 BOOKMARKS · 54 VIEWS
Density estimation based on mass
- Authors: Ting, Kaiming , Washio, Takashi , Wells, Jonathan , Liu, Fei
- Date: 2011
- Type: Text , Conference paper
- Relation: 11th IEEE International Conference on Data Mining (ICDM 2011) p. 715-724
- Full Text: false
- Reviewed:
- Description: Density estimation is the ubiquitous base modelling mechanism employed for many tasks such as clustering, classification, anomaly detection and information retrieval. Commonly used density estimation methods such as kernel density estimator and k-nearest neighbour density estimator have high time and space complexities which render them inapplicable in problems with large data size and even a moderate number of dimensions. This weakness sets the fundamental limit in existing algorithms for all these tasks. We propose the first density estimation method which stretches this fundamental limit to an extent that dealing with millions of data can now be done easily and quickly. We analyze the error of the new estimation (from the true density) using a bias-variance analysis. We then perform an empirical evaluation of the proposed method by replacing existing density estimators with the new one in two current density-based algorithms, namely, DBSCAN and LOF. The results show that the new density estimation method significantly improves the runtime of DBSCAN and LOF, while maintaining or improving their task-specific performances in clustering and anomaly detection, respectively. The new method empowers these algorithms, currently limited to small data size only, to process very large databases - setting a new benchmark for what density-based algorithms can achieve.
Fast anomaly detection for streaming data
- Authors: Tan, Swee , Ting, Kaiming , Liu, Fei
- Date: 2011
- Type: Text , Conference paper
- Relation: International Joint Conference on Artificial Intelligence (IJCAI) p. 1511-1516
- Full Text: false
- Reviewed:
- Description: This paper introduces Streaming Half-Space-Trees (HS-Trees), a fast one-class anomaly detector for evolving data streams. It requires only normal data for training and works well when anomalous data are rare. The model features an ensemble of random HS-Trees, and the tree structure is constructed without any data. This makes the method highly efficient because it requires no model restructuring when adapting to evolving data streams. Our analysis shows that Streaming HS-Trees has constant amortised time complexity and constant memory requirement. When compared with a state-of-the-art method, our method performs favourably in terms of detection accuracy and runtime performance. Our experimental results also show that the detection performance of Streaming HS-Trees is not sensitive to its parameter settings.
On low-rank regularized least squares for scalable nonlinear classification
- Authors: Fu, Zhouyu , Lu, Guojun , Ting, Kaiming , Zhang, Dengsheng
- Date: 2011
- Type: Text , Conference paper
- Relation: International Conference on Neural Information Processing p. 490-499
- Full Text: false
- Reviewed:
- Description: In this paper, we revisited the classical technique of Regularized Least Squares (RLS) for the classification of large-scale nonlinear data. Specifically, we focus on a low-rank formulation of RLS and show that it has linear time complexity in the data size only and does not rely on the number of labels and features for problems with moderate feature dimension. This makes low-rank RLS particularly suitable for classification with large data sets. Moreover, we have proposed a general theorem for the closed-form solutions to the Leave-One-Out Cross Validation (LOOCV) estimation problem in empirical risk minimization which encompasses all types of RLS classifiers as special cases. This eliminates the reliance on cross validation, a computationally expensive process for parameter selection, and greatly accelerate the training process of RLS classifiers. Experimental results on real and synthetic large-scale benchmark data sets have shown that low-rank RLS achieves comparable classification performance while being much more efficient than standard kernel SVM for nonlinear classification. The improvement in efficiency is more evident for data sets with higher dimensions.
Simplifying and improving ant-based clustering
- Authors: Tan, Swee , Ting, Kaiming , Teng, Shyh
- Date: 2011
- Type: Text , Conference paper
- Relation: 11th International Conference on Computational Science, ICCS 2011; Singapore, Singapore; 1st-3rd June 2011, published in Procedia Computer Science Vol. 4, p. 46-55
- Full Text:
- Reviewed:
- Description: Ant-based clustering (ABC) is a data clustering approach inspired from cemetery formation activities observed in real ant colonies. Building upon the premise of collective intelligence, such an approach uses multiple ant-like agents and a mixture of heuristics, in order to create systems that are capable of clustering real-world data. Many recently proposed ABC systems have shown competitive results, but these systems are geared towards adding new heuristics, resulting in increasingly complex systems that are harder to understand and improve. In contrast to this direction, we demonstrate that a state-of-the-art ABC system can be systematically evaluated and then simplified. The streamlined model, which we call SABC, differs fundamentally from traditional ABC systems as it does not use the ant-colony and several key components. Yet, our empirical study shows that SABC performs more effectively and effciently than the state-of-the-art ABC system.