PU-shapelets : Towards pattern-based positive unlabeled classification of time series
- Authors: Liang, Shen , Zhang, Yanchun , Ma, Jiangang
- Date: 2019
- Type: Text , Conference proceedings , Conference paper
- Relation: 24th International Conference on Database Systems for Advanced Applications, DASFAA 2019; Chiang Mai, Thailand; 22nd-25th April 2019; part of the Lecture Notes in Computer Science book series, also part of the Information Systems and Applications, incl. Internet/Web and HCI sub series Vol. 11446 LNCS, p. 87-103
- Full Text:
- Reviewed:
- Description: Real-world time series classification applications often involve positive unlabeled (PU) training data, where there are only a small set PL of positive labeled examples and a large set U of unlabeled ones. Most existing time series PU classification methods utilize all readings in the time series, making them sensitive to non-characteristic readings. Characteristic patterns named shapelets present a promising solution to this problem, yet discovering shapelets under PU settings is not easy. In this paper, we take on the challenging task of shapelet discovery with PU data. We propose a novel pattern ensemble technique utilizing both characteristic and non-characteristic patterns to rank U examples by their possibilities of being positive. We also present a novel stopping criterion to estimate the number of positive examples in U. These enable us to effectively label all U training examples and conduct supervised shapelet discovery. The shapelets are then used to build a one-nearest-neighbor classifier for online classification. Extensive experiments demonstrate the effectiveness of our method.
- Description: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Enhancing linear time complexity time series classification with hybrid bag-of-patterns
- Authors: Liang, Shen , Zhang, Yanchun , Ma, Jiangang
- Date: 2020
- Type: Text , Conference paper
- Relation: 25th International Conference on Database Systems for Advanced Applications, DASFAA 2020 Vol. 12112 LNCS, p. 717-735
- Full Text: false
- Reviewed:
- Description: In time series classification, one of the most popular models is Bag-Of-Patterns (BOP). Most BOP methods run in super-linear time. A recent work proposed a linear time BOP model, yet it has limited accuracy. In this work, we present Hybrid Bag-Of-Patterns (HBOP), which can greatly enhance accuracy while maintaining linear complexity. Concretely, we first propose a novel time series discretization method called SLA, which can retain more information than the classic SAX. We use a hybrid of SLA and SAX to expressively and compactly represent subsequences, which is our most important design feature. Moreover, we develop an efficient time series transformation method that is key to achieving linear complexity. We also propose a novel X-means clustering subroutine to handle subclasses. Extensive experiments on over 100 datasets demonstrate the effectiveness and efficiency of our method. © 2020, Springer Nature Switzerland AG.
Active model selection for positive unlabeled time series classification
- Authors: Liang, Shen , Zhang, Yanchun , Ma, Jiangang
- Date: 2020
- Type: Text , Conference proceedings , Conference paper
- Relation: 36th IEEE International Conference on Data Engineering, ICDE 2020 Vol. 2020-April, p. 361-372
- Full Text: false
- Reviewed:
- Description: Positive unlabeled time series classification (PUTSC) refers to classifying time series with a set PL of positive labeled examples and a set U of unlabeled ones. Model selection for PUTSC is a largely untouched topic. In this paper, we look into PUTSC model selection, which as far as we know is the first systematic study in this topic. Focusing on the widely adopted self-training one-nearest-neighbor (ST-1NN) paradigm, we propose a model selection framework based on active learning (AL). We present the novel concepts of self-training label propagation, pseudo label calibration principles and ultimately influence to fully exploit the mechanism of ST-1NN. Based on them, we develop an effective model performance evaluation strategy and three AL sampling strategies. Experiments on over 120 datasets and a case study in arrhythmia detection show that our methods can yield top performance in interactive environments, and can achieve near optimal results by querying very limited numbers of labels from the AL oracle. © 2020 IEEE.
- Description: E1