List of Titles

Defying the gravity of learning curve : A characteristic of nearest neighbour anomaly detectors

Authors: Ting, Kaiming , Washio, Takashi , Wells, Jonathan , Aryal, Sunil
Date: 2017
Type: Text , Journal article
Relation: Machine Learning Vol. 106, no. 1 (2017), p. 55-91
Full Text: false
Reviewed:
Description: Conventional wisdom in machine learning says that all algorithms are expected to follow the trajectory of a learning curve which is often colloquially referred to as ‘more data the better’. We call this ‘the gravity of learning curve’, and it is assumed that no learning algorithms are ‘gravity-defiant’. Contrary to the conventional wisdom, this paper provides the theoretical analysis and the empirical evidence that nearest neighbour anomaly detectors are gravity-defiant algorithms.

Commentary : A decomposition of the outlier detection problem into a set of supervised learning problems

Authors: Zhu, Ye , Ting, Kaiming
Date: 2016
Type: Text , Journal article
Relation: Machine Learning Vol. 105, no. 2 (2016), p. 301-304
Full Text: false
Reviewed:
Description: This article discusses the material in relation to iForest (Liu et al. in ACM Trans Knowl Discov Data 6(1):3, 2012) reported in a recent Machine Learning Journal paper by Paulheim and Meusel (Mach Learn 100(2–3):509–531, 2015). It presents an empirical comparison result of iForest using the default parameter settings suggested by its creator (Liu et al. 2012) and iForest using the settings employed by Paulheim and Meusel (2015). This comparison has an impact on the conclusion made by Paulheim and Meusel (2015). © 2016, The Author(s).

Half-space mass : a maximally robust and efficient data depth method

Authors: Chen, Bo , Ting, Kaiming , Washio, Takashi , Haffari, Gholamreza
Date: 2015
Type: Text , Journal article
Relation: Machine Learning Vol. 100, no. 2-3 (2015), p. 677-699
Full Text: false
Reviewed:
Description: Data depth is a statistical method which models data distribution in terms of center-outward ranking rather than density or linear ranking. While there are a lot of academic interests, its applications are hampered by the lack of a method which is both robust and efficient. This paper introduces Half-Space Mass which is a significantly improved version of half-space data depth. Half-Space Mass is the only data depth method which is both robust and efficient, as far as we know. We also reveal four theoretical properties of Half-Space Mass: (i) its resultant mass distribution is concave regardless of the underlying density distribution, (ii) its maximum point is unique which can be considered as median, (iii) the median is maximally robust, and (iv) its estimation extends to a higher dimensional space in which the convex hull of the dataset occupies zero volume. We demonstrate the power of Half-Space Mass through its applications in two tasks. In anomaly detection, being a maximally robust location estimator leads directly to a robust anomaly detector that yields a better detection accuracy than half-space depth; and it runs orders of magnitude faster than L2 depth, an existing maximally robust location estimator. In clustering, the Half-Space Mass version of K-means overcomes three weaknesses of K-means.
Description: Data depth is a statistical method which models data distribution in terms of center-outward ranking rather than density or linear ranking. While there are a lot of academic interests, its applications are hampered by the lack of a method which is both robust and efficient. This paper introduces

Mass estimation

Authors: Ting, Kaiming , Zhou, Guang , Liu, Fei , Tan, Swee
Date: 2013
Type: Text , Journal article
Relation: Machine Learning Vol. 90, no. 1 (2013), p. 127-160
Full Text: false
Reviewed:
Description: This paper introduces mass estimation—a base modelling mechanism that can be employed to solve various tasks in machine learning. We present the theoretical basis of mass and efficient methods to estimate mass. We show that mass estimation solves problems effectively in tasks such as information retrieval, regression and anomaly detection. The models, which use mass in these three tasks, perform at least as well as and often better than eight state-of-the-art methods in terms of task-specific performance measures. In addition, mass estimation has constant time and space complexities.

Showing items 1 - 4 of 4

Defying the gravity of learning curve : A characteristic of nearest neighbour anomaly detectors

Commentary : A decomposition of the outlier detection problem into a set of supervised learning problems

Half-space mass : a maximally robust and efficient data depth method

Mass estimation