Grouping points by shared subspaces for effective subspace clustering
- Authors: Zhu, Ye , Ting, Kaiming , Carman, Mark
- Date: 2018
- Type: Text , Journal article
- Relation: Pattern Recognition Vol. 83, no. (2018), p. 230-244
- Full Text: false
- Reviewed:
- Description: Clusters may exist in different subspaces of a multidimensional dataset. Traditional full-space clustering algorithms have difficulty in identifying these clusters. Various subspace clustering algorithms have used different subspace search strategies. They require clustering to assess whether cluster(s) exist in a subspace. In addition, all of them perform clustering by measuring similarity between points in the given feature space. As a result, the subspace selection and clustering processes are tightly coupled. In this paper, we propose a new subspace clustering framework named CSSub (Clustering by Shared Subspaces). It enables neighbouring core points to be clustered based on the number of subspaces they share. It explicitly splits candidate subspace selection and clustering into two separate processes, enabling different types of cluster definitions to be employed easily. Through extensive experiments on synthetic and real-world datasets, we demonstrate that CSSub discovers non-redundant subspace clusters with arbitrary shapes in noisy data; and it significantly outperforms existing state-of-the-art subspace clustering algorithms.
Isolation-based anomaly detection using nearest-neighbor ensembles
- Authors: Bandaragoda, Tharindu , Ting, Kaiming , Albrecht, David , Liu, Fei , Zhu, Ye , Wells, Jonathan
- Date: 2018
- Type: Text , Journal article
- Relation: Computational Intelligence Vol. 34, no. 4 (2018), p. 968-998
- Full Text: false
- Reviewed:
- Description: The first successful isolation-based anomaly detector, ie, iForest, uses trees as a means to perform isolation. Although it has been shown to have advantages over existing anomaly detectors, we have identified 4 weaknesses, ie, its inability to detect local anomalies, anomalies with a high percentage of irrelevant attributes, anomalies that are masked by axis-parallel clusters, and anomalies in multimodal data sets. To overcome these weaknesses, this paper shows that an alternative isolation mechanism is required and thus presents iNNE or isolation using Nearest Neighbor Ensemble. Although relying on nearest neighbors, iNNE runs significantly faster than the existing nearest neighbor–based methods such as the local outlier factor, especially in data sets having thousands of dimensions or millions of instances. This is because the proposed method has linear time complexity and constant space complexity. © 2018 Wiley Periodicals, Inc.
Local contrast as an effective means to robust clustering against varying densities
- Authors: Chen, Bo , Ting, Kaiming , Washio, Takashi , Zhu, Ye
- Date: 2018
- Type: Text , Journal article
- Relation: Machine Learning Vol. 107, no. 8-10 (2018), p. 1621-1645
- Full Text:
- Reviewed:
- Description: Most density-based clustering methods have difficulties detecting clusters of hugely different densities in a dataset. A recent density-based clustering CFSFDP appears to have mitigated the issue. However, through formalising the condition under which it fails, we reveal that CFSFDP still has the same issue. To address this issue, we propose a new measure called Local Contrast, as an alternative to density, to find cluster centers and detect clusters. We then apply Local Contrast to CFSFDP, and create a new clustering method called LC-CFSFDP which is robust in the presence of varying densities. Our empirical evaluation shows that LC-CFSFDP outperforms CFSFDP and three other state-of-the-art variants of CFSFDP. © 2018, The Author(s).
Data-dependent dissimilarity measure : An effective alternative to geometric distance measures
- Authors: Aryal, Sunil , Ting, Kaiming , Washio, Takashi , Haffari, Gholamreza
- Date: 2017
- Type: Text , Journal article
- Relation: Knowledge and Information Systems Vol. 53, no. 2 (2017), p. 479-506
- Full Text: false
- Reviewed:
- Description: Nearest neighbor search is a core process in many data mining algorithms. Finding reliable closest matches of a test instance is still a challenging task as the effectiveness of many general-purpose distance measures such as ℓp -norm decreases as the number of dimensions increases. Their performances vary significantly in different data distributions. This is mainly because they compute the distance between two instances solely based on their geometric positions in the feature space, and data distribution has no influence on the distance measure. This paper presents a simple data-dependent general-purpose dissimilarity measure called ‘ mp -dissimilarity’. Rather than relying on geometric distance, it measures the dissimilarity between two instances as a probability mass in a region that encloses the two instances in every dimension. It deems two instances in a sparse region to be more similar than two instances of equal inter-point geometric distance in a dense region. Our empirical results in k-NN classification and content-based multimedia information retrieval tasks show that the proposed mp -dissimilarity measure produces better task-specific performance than existing widely used general-purpose distance measures such as ℓp -norm and cosine distance across a wide range of moderate- to high-dimensional data sets with continuous only, discrete only, and mixed attributes.
Defying the gravity of learning curve : A characteristic of nearest neighbour anomaly detectors
- Authors: Ting, Kaiming , Washio, Takashi , Wells, Jonathan , Aryal, Sunil
- Date: 2017
- Type: Text , Journal article
- Relation: Machine Learning Vol. 106, no. 1 (2017), p. 55-91
- Full Text: false
- Reviewed:
- Description: Conventional wisdom in machine learning says that all algorithms are expected to follow the trajectory of a learning curve which is often colloquially referred to as ‘more data the better’. We call this ‘the gravity of learning curve’, and it is assumed that no learning algorithms are ‘gravity-defiant’. Contrary to the conventional wisdom, this paper provides the theoretical analysis and the empirical evidence that nearest neighbour anomaly detectors are gravity-defiant algorithms.
A generic ensemble approach to estimate multidimensional likelihood in Bayesian classifier learning
- Authors: Aryal, Sunil , Ting, Kaiming
- Date: 2016
- Type: Text , Journal article
- Relation: Computational Intelligence Vol. 32, no. 3 (2016), p. 458-479
- Full Text: false
- Reviewed:
- Description: In Bayesian classifier learning, estimating the joint probability distribution (,) or the likelihood (|) directly from training data is considered to be difficult, especially in large multidimensional data sets. To circumvent this difficulty, existing Bayesian classifiers such as Naive Bayes, BayesNet, and ADE have focused on estimating simplified surrogates of (,) from different forms of one‐dimensional likelihoods. Contrary to the perceived difficulty in multidimensional likelihood estimation, we present a simple generic ensemble approach to estimate multidimensional likelihood directly from data. The idea is to aggregate (|) estimated from a random subsample of data . This article presents two ways to estimate multidimensional likelihoods using the proposed generic approach and introduces two new Bayesian classifiers called and that estimate (|) using a nearest‐neighbor density estimation and a probability estimation through feature space partitioning, respectively. Unlike the existing Bayesian classifiers, ENNBayes and MassBayes have constant training time and space complexities and they scale better than existing Bayesian classifiers in very large data sets. Our empirical evaluation shows that ENNBayes and MassBayes yield better predictive accuracy than the existing Bayesian classifiers in benchmark data sets.
Commentary : A decomposition of the outlier detection problem into a set of supervised learning problems
- Authors: Zhu, Ye , Ting, Kaiming
- Date: 2016
- Type: Text , Journal article
- Relation: Machine Learning Vol. 105, no. 2 (2016), p. 301-304
- Full Text: false
- Reviewed:
- Description: This article discusses the material in relation to iForest (Liu et al. in ACM Trans Knowl Discov Data 6(1):3, 2012) reported in a recent Machine Learning Journal paper by Paulheim and Meusel (Mach Learn 100(2–3):509–531, 2015). It presents an empirical comparison result of iForest using the default parameter settings suggested by its creator (Liu et al. 2012) and iForest using the settings employed by Paulheim and Meusel (2015). This comparison has an impact on the conclusion made by Paulheim and Meusel (2015). © 2016, The Author(s).
Density-ratio based clustering for discovering clusters with varying densities
- Authors: Zhu, Ye , Ting, Kaiming , Carman, Mark
- Date: 2016
- Type: Text , Journal article
- Relation: Pattern Recognition Vol. 60, no. (2016), p. 983-997
- Full Text: false
- Reviewed:
- Description: Density-based clustering algorithms are able to identify clusters of arbitrary shapes and sizes in a dataset which contains noise. It is well-known that most of these algorithms, which use a global density threshold, have difficulty identifying all clusters in a dataset having clusters of greatly varying densities. This paper identifies and analyses the condition under which density-based clustering algorithms fail in this scenario. It proposes a density-ratio based method to overcome this weakness, and reveals that it can be implemented in two approaches. One approach is to modify a density-based clustering algorithm to do density-ratio based clustering by using its density estimator to compute density-ratio. The other approach involves rescaling the given dataset only. An existing density-based clustering algorithm, which is applied to the rescaled dataset, can find all clusters with varying densities that would otherwise impossible had the same algorithm been applied to the unscaled dataset. We provide an empirical evaluation using DBSCAN, OPTICS and SNN to show the effectiveness of these two approaches. © 2016 Elsevier Ltd
ZERO++ : Harnessing the power of zero appearances to detect anomalies in large-scale data sets
- Authors: Pang, Guansong , Ting, Kaiming , Albrecht, David , Jin, Huidong
- Date: 2016
- Type: Text , Journal article
- Relation: Journal of Artificial Intelligence Research Vol. 57, no. (2016), p. 593-620
- Full Text: false
- Reviewed:
- Description: This paper introduces a new unsupervised anomaly detector called ZERO++ which employs the number of zero appearances in subspaces to detect anomalies in categorical data. It is unique in that it works in regions of subspaces that are not occupied by data; whereas existing methods work in regions occupied by data. ZERO++ examines only a small number of low dimensional subspaces to successfully identify anomalies. Unlike existing frequency-based algorithms, ZERO++ does not involve subspace pattern searching. We show that ZERO++ is better than or comparable with the state-of-the-art anomaly detection methods over a wide range of real-world categorical and numeric data sets; and it is efficient with linear time complexity and constant space complexity which make it a suitable candidate for large-scale data sets.
Half-space mass : a maximally robust and efficient data depth method
- Authors: Chen, Bo , Ting, Kaiming , Washio, Takashi , Haffari, Gholamreza
- Date: 2015
- Type: Text , Journal article
- Relation: Machine Learning Vol. 100, no. 2-3 (2015), p. 677-699
- Full Text: false
- Reviewed:
- Description: Data depth is a statistical method which models data distribution in terms of center-outward ranking rather than density or linear ranking. While there are a lot of academic interests, its applications are hampered by the lack of a method which is both robust and efficient. This paper introduces Half-Space Mass which is a significantly improved version of half-space data depth. Half-Space Mass is the only data depth method which is both robust and efficient, as far as we know. We also reveal four theoretical properties of Half-Space Mass: (i) its resultant mass distribution is concave regardless of the underlying density distribution, (ii) its maximum point is unique which can be considered as median, (iii) the median is maximally robust, and (iv) its estimation extends to a higher dimensional space in which the convex hull of the dataset occupies zero volume. We demonstrate the power of Half-Space Mass through its applications in two tasks. In anomaly detection, being a maximally robust location estimator leads directly to a robust anomaly detector that yields a better detection accuracy than half-space depth; and it runs orders of magnitude faster than L2 depth, an existing maximally robust location estimator. In clustering, the Half-Space Mass version of K-means overcomes three weaknesses of K-means.
- Description: Data depth is a statistical method which models data distribution in terms of center-outward ranking rather than density or linear ranking. While there are a lot of academic interests, its applications are hampered by the lack of a method which is both robust and efficient. This paper introduces
LiNearN : A new approach to nearest neighbour density estimator
- Authors: Wells, Jonathan , Ting, Kaiming , Washio, Takashi
- Date: 2014
- Type: Text , Journal article
- Relation: Pattern Recognition Vol. 47, no. 8 (2014), p. 2702-2720
- Full Text: false
- Reviewed:
- Description: Despite their wide spread use, nearest neighbour density estimators have two fundamental limitations: O(n2) time complexity and O(n) space complexity. Both limitations constrain nearest neighbour density estimators to small data sets only. Recent progress using indexing schemes has improved to near linear time complexity only.We propose a new approach, called LiNearN for Linear time Nearest Neighbour algorithm, that yields the first nearest neighbour density estimator having O(n) time complexity and constant space complexity, as far as we know. This is achieved without using any indexing scheme because LiNearN uses a subsampling approach for which the subsample values are significantly less than the data size. Like existing density estimators, our asymptotic analysis reveals that the new density estimator has a parameter to trade off between bias and variance. We show that algorithms based on the new nearest neighbour density estimator can easily scale up to data sets with millions of instances in anomaly detection and clustering tasks. Highlights•Reject the premise that a NN algorithm must find the NN for every instance.•The first NN density estimator that has O(n) time complexity and O(1) space complexity.•These complexities are achieved without using any indexing scheme.•Our asymptotic analysis reveals that it trades off between bias and variance.•Easily scales up to large data sets in anomaly detection and clustering tasks.
DEMass: a new density estimator for big data
- Authors: Ting, Kaiming , Washio, Takashi , Wells, Jonathan , Liu, Fei , Aryal, Sunil
- Date: 2013
- Type: Text , Journal article
- Relation: Knowledge and Information Systems Vol. 35, no. 3 (2013), p. 493-524
- Full Text: false
- Reviewed:
- Description: Density estimation is the ubiquitous base modelling mechanism employed for many tasks including clustering, classification, anomaly detection and information retrieval. Commonly used density estimation methods such as kernel density estimator and k-nearest neighbour density estimator have high time and space complexities which render them inapplicable in problems with big data. This weakness sets the fundamental limit in existing algorithms for all these tasks. We propose the first density estimation method, having average case sub-linear time complexity and constant space complexity in the number of instances, that stretches this fundamental limit to an extent that dealing with millions of data can now be done easily and quickly. We provide an asymptotic analysis of the new density estimator and verify the generality of the method by replacing existing density estimators with the new one in three current density-based algorithms, namely DBSCAN, LOF and Bayesian classifiers, representing three different data mining tasks of clustering, anomaly detection and classification. Our empirical evaluation results show that the new density estimation method significantly improves their time and space complexities, while maintaining or improving their task-specific performances in clustering, anomaly detection and classification. The new method empowers these algorithms, currently limited to small data size only, to process big data—setting a new benchmark for what density-based algorithms can achieve.
Efficient nonlinear classification via low-rank regularised least squares
- Authors: Fu, Zhouyu , Lu, Guojun , Ting, Kaiming , Zhang, Dengsheng
- Date: 2013
- Type: Text , Journal article
- Relation: Neural Computing and Applications Vol. 22, no. 7-8(2013), p. 1279-1289
- Full Text: false
- Reviewed:
- Description: We revisit the classical technique of regularised least squares (RLS) for nonlinear classification in this paper. Specifically, we focus on a low-rank formulation of the RLS, which has linear time complexity in the size of data set only, independent of both the number of classes and number of features. This makes low-rank RLS particularly suitable for problems with large data and moderate feature dimensions. Moreover, we have proposed a general theorem for obtaining the closed-form estimation of prediction values on a holdout validation set given the low-rank RLS classifier trained on the whole training data. It is thus possible to obtain an error estimate for each parameter setting without retraining and greatly accelerate the process of cross-validation for parameter selection. Experimental results on several large-scale benchmark data sets have shown that low-rank RLS achieves comparable classification performance while being much more efficient than standard kernel SVM for nonlinear classification. The improvement in efficiency is more evident for data sets with higher dimensions.
Learning sparse kernel classifiers for multi-instance classification
- Authors: Fu, Zhouyu , Lu, Guojun , Ting, Kaiming , Zhang, Dengsheng
- Date: 2013
- Type: Text , Journal article
- Relation: IEEE Transactions on Neural Networks and Learning Systems Vol. 24, no. 9 (2013), p. 1377-1389
- Full Text: false
- Reviewed:
- Description: We propose a direct approach to learning sparse kernel classifiers for multi-instance (MI) classification to improve efficiency while maintaining predictive accuracy. The proposed method builds on a convex formulation for MI classification by considering the average score of individual instances for bag-level prediction. In contrast, existing formulations used the maximum score of individual instances in each bag, which leads to nonconvex optimization problems. Based on the convex MI framework, we formulate a sparse kernel learning algorithm by imposing additional constraints on the objective function to enforce the maximum number of expansions allowed in the prediction function. The formulated sparse learning problem for the MI classification is convex with respect to the classifier weights. Therefore, we can employ an effective optimization strategy to solve the optimization problem that involves the joint learning of both the classifier and the expansion vectors. In addition, the proposed formulation can explicitly control the complexity of the prediction model while still maintaining competitive predictive performance. Experimental results on benchmark data sets demonstrate that our proposed approach is effective in building very sparse kernel classifiers while achieving comparable performance to the state-of-the-art MI classifiers.
Local models - the key to boosting stable learners successfully
- Authors: Ting, Kaiming , Zhu, Lian , Wells, Jonathan
- Date: 2013
- Type: Text , Journal article
- Relation: Computational Intelligence Vol. 29, no. 2 (2013), p. 331-356
- Full Text: false
- Reviewed:
- Description: Boosting has been shown to improve the predictive performance of unstable learners such as decision trees, but not of stable learners like Support Vector Machines (SVM), k-nearest neighbours and Naive Bayes classifiers. In addition to the model stability problem, the high time complexity of some stable learners such as SVM prohibits them from generating multiple models to form an ensemble for large data sets. This paper introduces a simple method that not only enables Boosting to improve the predictive performance of stable learners, but also significantly reduces the computational time to generate an ensemble of stable learners such as SVM for large data sets that would otherwise be infeasible. The method proposes to build local models, instead of global models; and it is the first method, to the best of our knowledge, to solve the two problems in Boosting stable learners at the same time. We implement the method by using a decision tree to define local regions and build a local model for each local region. We show that this implementation of the proposed method enables successful Boosting of three types of stable learners: SVM, k-nearest neighbours and Naive Bayes classifiers.
- Description: Boosting has been shown to improve the predictive performance of unstable learners such as decision trees, but not of stable learners like Support Vector Machines (SVM), k-nearest neighbors and Naive Bayes classifiers. In addition to the model stability problem, the high time complexity of some stable learners such as SVM prohibits them from generating multiple models to form an ensemble for large data sets. This paper introduces a simple method that not only enables Boosting to improve the predictive performance of stable learners, but also significantly reduces the computational time to generate an ensemble of stable learners such as SVM for large data sets that would otherwise be infeasible. The method proposes to build local models, instead of global models; and it is the first method, to the best of our knowledge, to solve the two problems in Boosting stable learners at the same time. We implement the method by using a decision tree to define local regions and build a local model for each local region. We show that this implementation of the proposed method enables successful Boosting of three types of stable learners: SVM, k-nearest neighbors and Naive Bayes classifiers.
Mass estimation
- Authors: Ting, Kaiming , Zhou, Guang , Liu, Fei , Tan, Swee
- Date: 2013
- Type: Text , Journal article
- Relation: Machine Learning Vol. 90, no. 1 (2013), p. 127-160
- Full Text: false
- Reviewed:
- Description: This paper introduces mass estimation—a base modelling mechanism that can be employed to solve various tasks in machine learning. We present the theoretical basis of mass and efficient methods to estimate mass. We show that mass estimation solves problems effectively in tasks such as information retrieval, regression and anomaly detection. The models, which use mass in these three tasks, perform at least as well as and often better than eight state-of-the-art methods in terms of task-specific performance measures. In addition, mass estimation has constant time and space complexities.
MassBayes: a new generative classifier with multi-dimensional likelihood estimation
- Authors: Aryal, Sunil , Ting, Kaiming
- Date: 2013
- Type: Text , Conference paper
- Relation: Advances in Knowledge Discovery and Data Mining: 17th Pacific-Asia Conference p. 136-148
- Full Text: false
- Reviewed:
- Description: Existing generative classifiers (e.g., BayesNet and AnDE) make independence assumptions and estimate one-dimensional likelihood. This paper presents a new generative classifier called MassBayes that estimates multi-dimensional likelihood without making any explicit assumptions. It aggregates the multi-dimensional likelihoods estimated from random subsets of the training data using varying size random feature subsets. Our empirical evaluations show that MassBayes yields better classification accuracy than the existing generative classifiers in large data sets. As it works with fixed-size subsets of training data, it has constant training time complexity and constant space complexity, and it can easily scale up to very large data sets.
Optimizing cepstral features for audio classification
- Authors: Fu, Zhouyu , Lu, Guojun , Ting, Kaiming , Zhang, Dengsheng
- Date: 2013
- Type: Text , Conference paper
- Relation: International Joint Conference on Artificial Intelligence p. 1330-1336
- Full Text: false
- Reviewed:
- Description: Cepstral features have been widely used in audio applications. Domain knowledge has played an important role in designing different types of cepstral features proposed in the literature. In this paper, we present a novel approach for learning optimized cepstral features directly from audio data to better discriminate between different categories of signals in classification tasks. We employ multi-layer feedforward neural networks to model the cepstral feature extraction process. The network weights are initialized to replicate a reference cepstral feature like the mel frequency cepstral coefficient. We then propose a embedded approach that integrates feature learning with the training of a support vector machine (SVM) classifier. A single optimization problem is formulated where the feature and classifier variables are optimized simultaneously so as to refine the initial features and minimize the classification risk. Experimental results have demonstrated the effectiveness of the proposed feature learning approach, outperforming competing methods by a large margin on benchmark data.
Learning by extrapolation from marginal to full-multivariate probability distributions: Decreasingly naive Bayesian classification
- Authors: Webb, Geoffrey , Boughton, Janice , Zheng, Fei , Ting, Kaiming , Salem, Houssam
- Date: 2012
- Type: Text , Journal article
- Relation: Machine Learning Vol. 86, no. 2 (2012), p.233-272
- Full Text: false
- Reviewed:
- Description: Averaged n-Dependence Estimators (AnDE) is an approach to probabilistic classification learning that learns by extrapolation from marginal to full-multivariate probability distributions. It utilizes a single parameter that transforms the approach between a low-variance high-bias learner (Naive Bayes) and a high-variance low-bias learner with Bayes optimal asymptotic error. It extends the underlying strategy of Averaged One-Dependence Estimators (AODE), which relaxes the Naive Bayes independence assumption while retaining many of Naive Bayes’ desirable computational and theoretical properties. AnDE further relaxes the independence assumption by generalizing AODE to higher-levels of dependence. Extensive experimental evaluation shows that the bias-variance trade-off for Averaged 2-Dependence Estimators results in strong predictive accuracy over a wide range of data sets. It has training time linear with respect to the number of examples, learns in a single pass through the training data, supports incremental learning, handles directly missing values, and is robust in the face of noise. Beyond the practical utility of its lower-dimensional variants, AnDE is of interest in that it demonstrates that it is possible to create low-bias high-variance generative learners and suggests strategies for developing even more powerful classifiers.
Relevance feature mapping for content-based multimedia information retrieval
- Authors: Zhou, Guang , Ting, Kaiming , Liu, Fei , Yin, Yilong
- Date: 2012
- Type: Text , Journal article
- Relation: Pattern Recognition Vol. 45, no. 4 (2012), p. 1707-1720
- Full Text: false
- Reviewed:
- Description: This paper presents a novel ranking framework for content-based multimedia information retrieval (CBMIR). The framework introduces relevance features and a new ranking scheme. Each relevance feature measures the relevance of an instance with respect to a profile of the targeted multimedia database. We show that the task of CBMIR can be done more effectively using the relevance features than the original features. Furthermore, additional performance gain is achieved by incorporating our new ranking scheme which modifies instance rankings based on the weighted average of relevance feature values. Experiments on image and music databases validate the efficacy and efficiency of the proposed framework.