A new loss function for robust classification
- Authors: Zhao, Lei , Mammadov, Musa , Yearwood, John
- Date: 2014
- Type: Text , Journal article
- Relation: Intelligent Data Analysis Vol. 18, no. 4 (2014), p. 697-715
- Full Text: false
- Reviewed:
- Description: Loss function plays an important role in data classification. Manyloss functions have been proposed and applied to differentclassification problems. This paper proposes a new so called thesmoothed 0-1 loss function, that could be considered as anapproximation of the classical 0-1 loss function. Due to thenon-convexity property of the proposed loss function, globaloptimization methods are required to solve the correspondingoptimization problems. Together with the proposed loss function, wecompare the performance of several existing loss functions in theclassification of noisy data sets. In this comparison, differentoptimization problems are considered in regards to the convexity andsmoothness of different loss functions. The experimental resultsshow that the proposed smoothed 0-1 loss function works better ondata sets with noisy labels, noisy features, and outliers. © 2014 - IOS Press and the authors. All rights reserved.
ACSP-Tree: A tree structure for mining behavioral patterns from wireless sensor networks
- Authors: Rashid, Md. Mamunur , Gondal, Iqbal , Kamruzzaman, Joarder
- Date: 2013
- Type: Text , Conference paper
- Relation: IEEE Conference on Local Computer Networks (LCN 2013) (21 October 2013 to 24 October 2013) p. 691-694
- Full Text: false
- Reviewed:
- Description: WSNs generates a large amount of data in the form of stream and mining knowledge from the stream of data can be extremely useful. Association rules mining, from the sensor data, has been studied in recent literature. However, sensor association rules mining often produces a huge number of rules, but most of them either are redundant or fail to reflect the true correlation relationship among data objects. In this paper, we address this problem and propose mining of a new type of sensor behavioral pattern called associated-correlated sensor patterns. The proposed behavioral patterns capture not only association-like co-occurrences but also the substantial temporal correlations implied by such co-occurrences in the sensor data. Here, we also use a prefix tree-based structure called associated-correlated sensor pattern-tree (ACSP-tree), which facilitates frequent pattern (FP) growth-based mining technique to generate all associated-correlated patterns from WSN data with only one scan over the sensor database. Extensive performance study shows that our approach is time and memory efficient in finding associated-correlated patterns than the existing most efficient algorithms.
Classification through incremental max-min separability
- Authors: Bagirov, Adil , Ugon, Julien , Webb, Dean , Karasozen, Bulent
- Date: 2011
- Type: Text , Journal article
- Relation: Pattern Analysis and Applications Vol. 14, no. 2 (2011), p. 165-174
- Relation: http://purl.org/au-research/grants/arc/DP0666061
- Full Text: false
- Reviewed:
- Description: Piecewise linear functions can be used to approximate non-linear decision boundaries between pattern classes. Piecewise linear boundaries are known to provide efficient real-time classifiers. However, they require a long training time. Finding piecewise linear boundaries between sets is a difficult optimization problem. Most approaches use heuristics to avoid solving this problem, which may lead to suboptimal piecewise linear boundaries. In this paper, we propose an algorithm for globally training hyperplanes using an incremental approach. Such an approach allows one to find a near global minimizer of the classification error function and to compute as few hyperplanes as needed for separating sets. We apply this algorithm for solving supervised data classification problems and report the results of numerical experiments on real-world data sets. These results demonstrate that the new algorithm requires a reasonable training time and its test set accuracy is consistently good on most data sets compared with mainstream classifiers. © 2010 Springer-Verlag London Limited.
Isolation forest
- Authors: Liu, Fei , Ting, Kaiming , Zhou, Zhi-Hua
- Date: 2008
- Type: Text , Conference paper
- Relation: Proceedings of the Eighth IEEE International Conference on Data Mining p. 413-422
- Full Text: false
- Reviewed:
- Description: Most existing model-based approaches to anomaly detection construct a profile of normal instances, then identify instances that do not conform to the normal profile as anomalies. This paper proposes a fundamentally different model-based method that explicitly isolates anomalies instead of profiles normal points. To our best knowledge, the concept of isolation has not been explored in current literature. The use of isolation enables the proposed method, iForest, to exploit sub-sampling to an extent that is not feasible in existing methods, creating an algorithm which has a linear time complexity with a low constant and a low memory requirement. Our empirical evaluation shows that iForest performs favourably to ORCA, a near-linear time complexity distance-based method, LOF and random forests in terms of AUC and processing time, and especially in large data sets. iForest also works well in high dimensional problems which have a large number of irrelevant attributes, and in situations where training set does not contain any anomalies.
Issues of grid-cluster retrievals in swarm-based clustering
- Authors: Tan, Swee , Ting, Kaiming , Teng, Shyh
- Date: 2008
- Type: Text , Conference paper
- Relation: Proceedings of the 2008 IEEE World Congress on Computational Intelligence p. 511-518
- Full Text: false
- Reviewed:
- Description: One common approach in swarm-based clustering is to use agents to create a set of clusters on a two-dimensional grid, and then use an existing clustering method to retrieve the clusters on the grid. The second step, which we call grid-cluster retrieval, is an essential step to obtain an explicit partitioning of data. In this study, we highlight the issues in grid-cluster retrievals commonly neglected by researchers, and demonstrate the non-trivial difficulties involved. To tackle the issues, we then evaluate three methods: K-means, hierarchical clustering (Weighted Single-link) and density-based clustering (DBScan). Among the three methods, DBScan is the only method which has not been previously used for grid-cluster retrievals, yet it is shown to be the most suitable method in terms of effectiveness and efficiency.
Regularly frequent patterns mining from sensor data stream
- Authors: Rashid, Md. Mamunur , Gondal, Iqbal , Kamruzzaman, Joarder
- Date: 2013
- Type: Text , Conference paper
- Relation: International Conference on Neural Information Processing (ICONIP 2013) p. 417-424
- Full Text: false
- Reviewed:
- Description: Mining interesting and useful knowledge from the huge amount of data gathered in wireless sensor networks is a challenging task. Works reported in literature use support metric-based sensor association rule which employs the occurrence frequency of patterns as criteria. Such criteria may not be appropriate for finding significant patterns. Moreover, temporal regularity in occurrence behavior should be considered as another important measure for assessing the importance of patterns in WSNs. Frequent sensor patterns that occur after regular intervals is called regularly frequent sensor patterns. Even though mining regularly frequent sensor patterns from sensor data stream is extremely important in many real-time applications, no such algorithm has been proposed yet. In this paper, we propose a novel tree structure called Regularly Frequent Sensor Pattern-tree (RSP-tree) and an efficient mining approach for finding regularly frequent sensor patterns from WSNs. Extensive performance analyses show that our technique is time and memory efficient in finding regularly frequent sensor patterns.
Using meta-regression data mining to improve predictions of performance based on heart rate dynamics for Australian football
- Authors: Jelinek, Herbert , Kelarev, Andrei , Robinson, Dean , Stranieri, Andrew , Cornforth, David
- Date: 2014
- Type: Text , Journal article
- Relation: Applied Soft Computing Vol. 14, no. PART A (2014), p. 81-87
- Full Text: false
- Reviewed:
- Description: This work investigates the effectiveness of using computer-based machine learning regression algorithms and meta-regression methods to predict performance data for Australian football players based on parameters collected during daily physiological tests. Three experiments are described. The first uses all available data with a variety of regression techniques. The second uses a subset of features selected from the available data using the Random Forest method. The third used meta-regression with the selected feature subset. Our experiments demonstrate that feature selection and meta-regression methods improve the accuracy of predictions for match performance of Australian football players based on daily data of medical tests, compared to regression methods alone. Meta-regression methods and feature selection were able to obtain performance prediction outcomes with significant correlation coefficients. The best results were obtained by the additive regression based on isotonic regression for a set of most influential features selected by Random Forest. This model was able to predict athlete performance data with a correlation coefficient of 0.86 (p < 0.05). © 2013 Published by Elsevier B.V. All rights reserved.
- Description: C1