PU-shapelets : Towards pattern-based positive unlabeled classification of time series
- Authors: Liang, Shen , Zhang, Yanchun , Ma, Jiangang
- Date: 2019
- Type: Text , Conference proceedings , Conference paper
- Relation: 24th International Conference on Database Systems for Advanced Applications, DASFAA 2019; Chiang Mai, Thailand; 22nd-25th April 2019; part of the Lecture Notes in Computer Science book series, also part of the Information Systems and Applications, incl. Internet/Web and HCI sub series Vol. 11446 LNCS, p. 87-103
- Full Text:
- Reviewed:
- Description: Real-world time series classification applications often involve positive unlabeled (PU) training data, where there are only a small set PL of positive labeled examples and a large set U of unlabeled ones. Most existing time series PU classification methods utilize all readings in the time series, making them sensitive to non-characteristic readings. Characteristic patterns named shapelets present a promising solution to this problem, yet discovering shapelets under PU settings is not easy. In this paper, we take on the challenging task of shapelet discovery with PU data. We propose a novel pattern ensemble technique utilizing both characteristic and non-characteristic patterns to rank U examples by their possibilities of being positive. We also present a novel stopping criterion to estimate the number of positive examples in U. These enable us to effectively label all U training examples and conduct supervised shapelet discovery. The shapelets are then used to build a one-nearest-neighbor classifier for online classification. Extensive experiments demonstrate the effectiveness of our method.
- Description: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Mining outlying aspects on healthcare data
- Authors: Samariya, Durgesh , Ma, Jiangang
- Date: 2021
- Type: Text , Conference paper
- Relation: 10th International Conference on Health Information Science, HIS 2021, Melbourne, 25-28 October 2021, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) Vol. 13079 LNCS, p. 160-170
- Full Text: false
- Reviewed:
- Description: Machine learning and artificial intelligence have a wide range of applications in medical domain, such as detecting anomalous reading, anomalous patient health condition, etc. Many algorithms have been developed to solve this problem. However, they fail to answer why those entries are considered as an outlier. This research gap leads to outlying aspect mining problem. The problem of outlying aspect mining aims to discover the set of features (a.k.a subspace) in which the given data point is dramatically different than others. In this paper, we present an interesting application of outlying aspect mining in the medical domain. This paper aims to effectively and efficiently identify outlying aspects using different outlying aspect mining algorithms and evaluate their performance on different real-world healthcare datasets. The experimental results show that the latest isolation-based outlying aspect mining measure, SiNNE, have outstanding performance on this task and have promising results. © 2021, Springer Nature Switzerland AG.
Human pose based video compression via forward-referencing using deep learning
- Authors: Rajin, S.M. Ataul Karim , Murshed, Manzur , Paul, Manoranjan , Teng, Shyh , Ma, Jiangang
- Date: 2022
- Type: Text , Conference paper
- Relation: 2022 IEEE International Conference on Visual Communications and Image Processing, VCIP 2022, Suzhou, China,13-16 December 2022, 2022 IEEE International Conference on Visual Communications and Image Processing, VCIP 2022
- Full Text: false
- Reviewed:
- Description: To exploit high temporal correlations in video frames of the same scene, the current frame is predicted from the already-encoded reference frames using block-based motion estimation and compensation techniques. While this approach can efficiently exploit the translation motion of the moving objects, it is susceptible to other types of affine motion and object occlusion/deocclusion. Recently, deep learning has been used to model the high-level structure of human pose in specific actions from short videos and then generate virtual frames in future time by predicting the pose using a generative adversarial network (GAN). Therefore, modelling the high-level structure of human pose is able to exploit semantic correlation by predicting human actions and determining its trajectory. Video surveillance applications will benefit as stored 'big' surveillance data can be compressed by estimating human pose trajectories and generating future frames through semantic correlation. This paper explores a new way of video coding by modelling human pose from the already-encoded frames and using the generated frame at the current time as an additional forward-referencing frame. It is expected that the proposed approach can overcome the limitations of the traditional backward-referencing frames by predicting the blocks containing the moving objects with lower residuals. Our experimental results show that the proposed approach can achieve on average up to 2.83 dB PSNR gain and 25.93% bitrate savings for high motion video sequences compared to standard video coding. © 2022 IEEE.
Enhancing linear time complexity time series classification with hybrid bag-of-patterns
- Authors: Liang, Shen , Zhang, Yanchun , Ma, Jiangang
- Date: 2020
- Type: Text , Conference paper
- Relation: 25th International Conference on Database Systems for Advanced Applications, DASFAA 2020 Vol. 12112 LNCS, p. 717-735
- Full Text: false
- Reviewed:
- Description: In time series classification, one of the most popular models is Bag-Of-Patterns (BOP). Most BOP methods run in super-linear time. A recent work proposed a linear time BOP model, yet it has limited accuracy. In this work, we present Hybrid Bag-Of-Patterns (HBOP), which can greatly enhance accuracy while maintaining linear complexity. Concretely, we first propose a novel time series discretization method called SLA, which can retain more information than the classic SAX. We use a hybrid of SLA and SAX to expressively and compactly represent subsequences, which is our most important design feature. Moreover, we develop an efficient time series transformation method that is key to achieving linear complexity. We also propose a novel X-means clustering subroutine to handle subclasses. Extensive experiments on over 100 datasets demonstrate the effectiveness and efficiency of our method. © 2020, Springer Nature Switzerland AG.
Bilateral insider threat detection : harnessing standalone and sequential activities with recurrent neural networks
- Authors: Manoharan, Phavithra , Hong, Wei , Yin, Jiao , Zhang, Yanchun , Ye, Wenjie , Ma, Jiangang
- Date: 2023
- Type: Text , Conference paper
- Relation: 24th International Conference on Web Information Systems Engineering, WISE 2023, Melbourne, 25-27 October 2023, Web Information Systems Engineering – WISE 2023, 24th International Conference, Melbourne, VIC, Australia, October 25–27, 2023, Proceedings Vol. 14306 LNCS, p. 179-188
- Full Text: false
- Reviewed:
- Description: Insider threats involving authorised individuals exploiting their access privileges within an organisation can yield substantial damage compared to external threats. Conventional detection approaches analyse user behaviours from logs, using binary classifiers to distinguish between malicious and non-malicious users. However, existing methods focus solely on standalone or sequential activities. To enhance the detection of malicious insiders, we propose a novel approach: bilateral insider threat detection combining RNNs to incorporate standalone and sequential activities. Initially, we extract behavioural traits from log files representing standalone activities. Subsequently, RNN models capture features of sequential activities. Concatenating these features, we employ binary classification to detect insider threats effectively. Experiments on the CERT 4.2 dataset showcase the approach’s superiority, significantly enhancing insider threat detection using features from both standalone and sequential activities. © 2023, The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
Anomaly detection on health data
- Authors: Samariya, Durgesh , Ma, Jiangang
- Date: 2022
- Type: Text , Conference paper
- Relation: 11th International Conference on Health Information Science, HIS 2022, Virtual, Online, 28- 30 October 2022, Health Information Science, 11th International Conference, HIS 2022, Virtual Event, October 28–30, 2022, Proceedings Vol. 13705 LNCS, p. 34-41
- Full Text: false
- Reviewed:
- Description: The identification of anomalous records in medical data is an important problem with numerous applications such as detecting anomalous reading, anomalous patient health condition, health insurance fraud detection and fault detection in mechanical components. This paper compares the performances of seven state-of-the-art anomaly detection algorithms to do detect anomalies in healthcare data. Our experimental results in six datasets show that the state-of-the-art method of isolation based method iForest has a better performance overall in terms of AUC and runtime. © 2022, The Author(s), under exclusive license to Springer Nature Switzerland AG.
Active model selection for positive unlabeled time series classification
- Authors: Liang, Shen , Zhang, Yanchun , Ma, Jiangang
- Date: 2020
- Type: Text , Conference proceedings , Conference paper
- Relation: 36th IEEE International Conference on Data Engineering, ICDE 2020 Vol. 2020-April, p. 361-372
- Full Text: false
- Reviewed:
- Description: Positive unlabeled time series classification (PUTSC) refers to classifying time series with a set PL of positive labeled examples and a set U of unlabeled ones. Model selection for PUTSC is a largely untouched topic. In this paper, we look into PUTSC model selection, which as far as we know is the first systematic study in this topic. Focusing on the widely adopted self-training one-nearest-neighbor (ST-1NN) paradigm, we propose a model selection framework based on active learning (AL). We present the novel concepts of self-training label propagation, pseudo label calibration principles and ultimately influence to fully exploit the mechanism of ST-1NN. Based on them, we develop an effective model performance evaluation strategy and three AL sampling strategies. Experiments on over 120 datasets and a case study in arrhythmia detection show that our methods can yield top performance in interactive environments, and can achieve near optimal results by querying very limited numbers of labels from the AL oracle. © 2020 IEEE.
- Description: E1
A weighted overlook graph representation of eeg data for absence epilepsy detection
- Authors: Wang, Jialin , Liang, Shen , Wang, Ye , Zhang, Yanchun , Ma, Jiangang
- Date: 2020
- Type: Text , Conference proceedings , Conference paper
- Relation: 20th IEEE International Conference on Data Mining, ICDM 2020 Vol. 2020-November, p. 581-590
- Full Text: false
- Reviewed:
- Description: Absence epilepsy is one of the most common types of epilepsy. The diagnosis of absence epilepsy is among the greatest challenges faced by clinical neurologists due to a lack of easily observable symptoms that are present in conventional epilepsy (e.g. spasm and convulsion), and highly relies on the detection of Spike and Slow Waves (SSWs) in Electroencephalogram (EEG) signals. Recently, graph representations called complex networks have been increasingly applied to characterizing 1D EEG signals. However, existing methods often fail to effectively represent SSWs, struggling to capture the differences between SSW waveforms and their non-SSW counterparts, such as minute differences and distinct shapes. Addressing this issue, in this work, we propose two simple yet effective complex networks, Overlook Graph (OG) and Weighted Overlook Graph (WOG), which have been customized to expressively represent SSWs. Built upon OG and WOG, we then develop a 2D Convolutional Neural Network (2D-CNN) to further learn latent features from the graph representations and accomplish the detection task. Extensive experiments on a real-world absence epilepsy EEG dataset show that the proposed OG/WOG-2D-CNN method can accurately detect SSWs. Additional experiments on the well-known Bonn dataset further show that our method can generalize to the conventional epilepsy seizure detection task with highly competitive performances. © 2020 IEEE. *Please note that there are multiple authors for this article therefore only the name of the first 5 including Federation University Australia affiliate "Jiangang Ma“ is provided in this record**
A new effective and efficient measure for outlying aspect mining
- Authors: Samariya, Durgesh , Aryal, Sunil , Ting, Kai , Ma, Jiangang
- Date: 2020
- Type: Text , Conference paper
- Relation: 21st International Conference on Web Information Systems Engineering, WISE 2020, Amsterdam. 20-24 October 2020, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics Vol. 12343 LNCS, p. 463-474
- Full Text: false
- Reviewed:
- Description: Outlying Aspect Mining (OAM) aims to find the subspaces (a.k.a. aspects) in which a given query is an outlier with respect to a given data set. Existing OAM algorithms use traditional distance/density-based outlier scores to rank subspaces. Because these distance/density-based scores depend on the dimensionality of subspaces, they cannot be compared directly between subspaces of different dimensionality. Z-score normalisation has been used to make them comparable. It requires to compute outlier scores of all instances in each subspace. This adds significant computational overhead on top of already expensive density estimation—making OAM algorithms infeasible to run in large and/or high-dimensional datasets. We also discover that Z-score normalisation is inappropriate for OAM in some cases. In this paper, we introduce a new score called Simple Isolation score using Nearest Neighbor Ensemble (SiNNE), which is independent of the dimensionality of subspaces. This enables the scores in subspaces with different dimensionalities to be compared directly without any additional normalisation. Our experimental results revealed that SiNNE produces better or at least the same results as existing scores; and it significantly improves the runtime of an existing OAM algorithm based on beam search. © 2020, Springer Nature Switzerland AG.