PU-shapelets : Towards pattern-based positive unlabeled classification of time series
- Authors: Liang, Shen , Zhang, Yanchun , Ma, Jiangang
- Date: 2019
- Type: Text , Conference proceedings , Conference paper
- Relation: 24th International Conference on Database Systems for Advanced Applications, DASFAA 2019; Chiang Mai, Thailand; 22nd-25th April 2019; part of the Lecture Notes in Computer Science book series, also part of the Information Systems and Applications, incl. Internet/Web and HCI sub series Vol. 11446 LNCS, p. 87-103
- Full Text:
- Reviewed:
- Description: Real-world time series classification applications often involve positive unlabeled (PU) training data, where there are only a small set PL of positive labeled examples and a large set U of unlabeled ones. Most existing time series PU classification methods utilize all readings in the time series, making them sensitive to non-characteristic readings. Characteristic patterns named shapelets present a promising solution to this problem, yet discovering shapelets under PU settings is not easy. In this paper, we take on the challenging task of shapelet discovery with PU data. We propose a novel pattern ensemble technique utilizing both characteristic and non-characteristic patterns to rank U examples by their possibilities of being positive. We also present a novel stopping criterion to estimate the number of positive examples in U. These enable us to effectively label all U training examples and conduct supervised shapelet discovery. The shapelets are then used to build a one-nearest-neighbor classifier for online classification. Extensive experiments demonstrate the effectiveness of our method.
- Description: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
A weighted overlook graph representation of eeg data for absence epilepsy detection
- Authors: Wang, Jialin , Liang, Shen , Wang, Ye , Zhang, Yanchun , Ma, Jiangang
- Date: 2020
- Type: Text , Conference proceedings , Conference paper
- Relation: 20th IEEE International Conference on Data Mining, ICDM 2020 Vol. 2020-November, p. 581-590
- Full Text: false
- Reviewed:
- Description: Absence epilepsy is one of the most common types of epilepsy. The diagnosis of absence epilepsy is among the greatest challenges faced by clinical neurologists due to a lack of easily observable symptoms that are present in conventional epilepsy (e.g. spasm and convulsion), and highly relies on the detection of Spike and Slow Waves (SSWs) in Electroencephalogram (EEG) signals. Recently, graph representations called complex networks have been increasingly applied to characterizing 1D EEG signals. However, existing methods often fail to effectively represent SSWs, struggling to capture the differences between SSW waveforms and their non-SSW counterparts, such as minute differences and distinct shapes. Addressing this issue, in this work, we propose two simple yet effective complex networks, Overlook Graph (OG) and Weighted Overlook Graph (WOG), which have been customized to expressively represent SSWs. Built upon OG and WOG, we then develop a 2D Convolutional Neural Network (2D-CNN) to further learn latent features from the graph representations and accomplish the detection task. Extensive experiments on a real-world absence epilepsy EEG dataset show that the proposed OG/WOG-2D-CNN method can accurately detect SSWs. Additional experiments on the well-known Bonn dataset further show that our method can generalize to the conventional epilepsy seizure detection task with highly competitive performances. © 2020 IEEE. *Please note that there are multiple authors for this article therefore only the name of the first 5 including Federation University Australia affiliate "Jiangang Ma“ is provided in this record**
Active model selection for positive unlabeled time series classification
- Authors: Liang, Shen , Zhang, Yanchun , Ma, Jiangang
- Date: 2020
- Type: Text , Conference proceedings , Conference paper
- Relation: 36th IEEE International Conference on Data Engineering, ICDE 2020 Vol. 2020-April, p. 361-372
- Full Text: false
- Reviewed:
- Description: Positive unlabeled time series classification (PUTSC) refers to classifying time series with a set PL of positive labeled examples and a set U of unlabeled ones. Model selection for PUTSC is a largely untouched topic. In this paper, we look into PUTSC model selection, which as far as we know is the first systematic study in this topic. Focusing on the widely adopted self-training one-nearest-neighbor (ST-1NN) paradigm, we propose a model selection framework based on active learning (AL). We present the novel concepts of self-training label propagation, pseudo label calibration principles and ultimately influence to fully exploit the mechanism of ST-1NN. Based on them, we develop an effective model performance evaluation strategy and three AL sampling strategies. Experiments on over 120 datasets and a case study in arrhythmia detection show that our methods can yield top performance in interactive environments, and can achieve near optimal results by querying very limited numbers of labels from the AL oracle. © 2020 IEEE.
- Description: E1
Enhancing linear time complexity time series classification with hybrid bag-of-patterns
- Authors: Liang, Shen , Zhang, Yanchun , Ma, Jiangang
- Date: 2020
- Type: Text , Conference paper
- Relation: 25th International Conference on Database Systems for Advanced Applications, DASFAA 2020 Vol. 12112 LNCS, p. 717-735
- Full Text: false
- Reviewed:
- Description: In time series classification, one of the most popular models is Bag-Of-Patterns (BOP). Most BOP methods run in super-linear time. A recent work proposed a linear time BOP model, yet it has limited accuracy. In this work, we present Hybrid Bag-Of-Patterns (HBOP), which can greatly enhance accuracy while maintaining linear complexity. Concretely, we first propose a novel time series discretization method called SLA, which can retain more information than the classic SAX. We use a hybrid of SLA and SAX to expressively and compactly represent subsequences, which is our most important design feature. Moreover, we develop an efficient time series transformation method that is key to achieving linear complexity. We also propose a novel X-means clustering subroutine to handle subclasses. Extensive experiments on over 100 datasets demonstrate the effectiveness and efficiency of our method. © 2020, Springer Nature Switzerland AG.