Learning by extrapolation from marginal to full-multivariate probability distributions: Decreasingly naive Bayesian classification
- Authors: Webb, Geoffrey , Boughton, Janice , Zheng, Fei , Ting, Kaiming , Salem, Houssam
- Date: 2012
- Type: Text , Journal article
- Relation: Machine Learning Vol. 86, no. 2 (2012), p.233-272
- Full Text: false
- Reviewed:
- Description: Averaged n-Dependence Estimators (AnDE) is an approach to probabilistic classification learning that learns by extrapolation from marginal to full-multivariate probability distributions. It utilizes a single parameter that transforms the approach between a low-variance high-bias learner (Naive Bayes) and a high-variance low-bias learner with Bayes optimal asymptotic error. It extends the underlying strategy of Averaged One-Dependence Estimators (AODE), which relaxes the Naive Bayes independence assumption while retaining many of Naive Bayes’ desirable computational and theoretical properties. AnDE further relaxes the independence assumption by generalizing AODE to higher-levels of dependence. Extensive experimental evaluation shows that the bias-variance trade-off for Averaged 2-Dependence Estimators results in strong predictive accuracy over a wide range of data sets. It has training time linear with respect to the number of examples, learns in a single pass through the training data, supports incremental learning, handles directly missing values, and is robust in the face of noise. Beyond the practical utility of its lower-dimensional variants, AnDE is of interest in that it demonstrates that it is possible to create low-bias high-variance generative learners and suggests strategies for developing even more powerful classifiers.
Local and global algorithms for learning dynamic Bayesian networks
- Authors: Nguyen, Vinh , Chetty, Madhu , Coppel, Ross , Wangikar, Pramod
- Date: 2012
- Type: Text , Conference paper
- Relation: The 12th IEEE International Conference on Data Mining (ICDM 2012) p. 685-694
- Full Text: false
- Reviewed:
- Description: Learning optimal Bayesian networks (BN) from data is NP-hard in general. Nevertheless, certain BN classes with additional topological constraints, such as the dynamic BN (DBN) models, widely applied in specific fields such as systems biology, can be efficiently learned in polynomial time. Such algorithms have been developed for the Bayesian-Dirichlet (BD), Minimum Description Length (MDL), and Mutual Information Test (MIT) scoring metrics. The BD-based algorithm admits a large polynomial bound, hence it is impractical for even modestly sized networks. The MDL-and MIT-based algorithms admit much smaller bounds, but require a very restrictive assumption that all variables have the same cardinality, thus significantly limiting their applicability. In this paper, we first propose an improvement to the MDL-and MIT-based algorithms, dropping the equicardinality constraint, thus significantly enhancing their generality. We also explore local Markov blanket based algorithms for constructing BN in the context of DBN, and show an interesting result: under the faithfulness assumption, the mutual information test based local Markov blanket algorithms yield the same network as learned by the global optimization MIT-based algorithm. Experimental validation on small and large scale genetic networks demonstrates the effectiveness of our proposed approaches.
Performance comparisons of contour-based corner detectors
- Authors: Awrangjeb, Mohammad , Lu, Guojun , Fraser, Clive
- Date: 2012
- Type: Text , Journal article
- Relation: IEEE Transactions on Image Processing Vol. 21, no. 9 (2012), p. 4167-4179
- Full Text: false
- Reviewed:
- Description: Abstract— Corner detectors have many applications in computer vision and image identification and retrieval. Contour-based corner detectors directly or indirectly estimate a significance measure (e.g., curvature) on the points of a planar curve, and select the curvature extrema points as corners. While an extensive number of contour-based corner detectors have been proposed over the last four decades, there is no comparative study of recently proposed detectors. This paper is an attempt to fill this gap. The general framework of contour-based corner detection is presented, and two major issues – curve smoothing and curvature estimation, which have major impacts on the corner detection performance, are discussed. A number of promising detectors are compared using both automatic and manual evaluation systems on two large datasets. It is observed that while the detectors using indirect curvature estimation techniques are more robust, the detectors using direct curvature estimation techniques are faster.
Potential control of human immunodeficiency virus type 1 asp expression by alternative splicing in the upstream untranslated region
- Authors: Barbagallo, Michael , Birch, Kate , Deacon, Nicholas , Mosse, Jennifer
- Date: 2012
- Type: Text , Journal article
- Relation: DNA and Cell Biology Vol. 31, no. 7 (2012), p. 1303-1313
- Full Text: false
- Reviewed:
- Description: The negative-sense asp open reading frame (ORF) positioned opposite to the human immunodeficiency virus type 1 (HIV-1) env gene encodes the 189 amino acid, membrane-associated ASP protein. Negative-sense transcription, regulated by long terminal repeat sequences, has been observed early in HIV-1 infection in vitro. All subtypes of HIV-1 were scanned to detect the negative-sense asp ORF and to identify potential regulatory sequences. A series of highly conserved upstream short open reading frames (sORFs) was identified. This potential control region from HIV-1NL4-3, containing six sORFs, was cloned upstream of the reporter gene EGFP. Expression by transfection of HEK293 cells indicated that the introduction of this sORF region inhibits EGFP reporter expression; analysis of transcripts revealed no significant changes in levels of EGFP mRNA. Reverse transcriptase–polymerase chain reaction analysis (RT-PCR) further demonstrated that the upstream sORF region undergoes alternative splicing in vitro. The most abundant product is spliced to remove sORFs I to V, leaving only the in-frame sORF VI upstream of asp. Sequence analysis revealed the presence of typical splice donor- and acceptor-site motifs. Mutation of the highly conserved splice donor and acceptor sites modulates, but does not fully relieve, inhibition of EGFP production. The strong conservation of asp and its sORFs across all HIV-1 subtypes suggests that the asp gene product may have a role in the pathogenesis of HIV-1. Alternative splicing of the upstream sORF region provides a potential mechanism for controlling expression of the asp gene.
Recentred local profiles for authorship attribution
- Authors: Layton, Robert , Watters, Paul , Dazeley, Richard
- Date: 2012
- Type: Text , Journal article
- Relation: Natural Language Engineering Vol. 18, no. 3 (2012), p. 293-312
- Full Text:
- Reviewed:
- Description: Authorship attribution methods aim to determine the author of a document, by using information gathered from a set of documents with known authors. One method of performing this task is to create profiles containing distinctive features known to be used by each author. In this paper, a new method of creating an author or document profile is presented that detects features considered distinctive, compared to normal language usage. This recentreing approach creates more accurate profiles than previous methods, as demonstrated empirically using a known corpus of authorship problems. This method, named recentred local profiles, determines authorship accurately using a simple 'best matching author' approach to classification, compared to other methods in the literature. The proposed method is shown to be more stable than related methods as parameter values change. Using a weighted voting scheme, recentred local profiles is shown to outperform other methods in authorship attribution, with an overall accuracy of 69.9% on the ad-hoc authorship attribution competition corpus, representing a significant improvement over related methods. Copyright © Cambridge University Press 2011.
- Description: 2003010688
Relevance feature mapping for content-based multimedia information retrieval
- Authors: Zhou, Guang , Ting, Kaiming , Liu, Fei , Yin, Yilong
- Date: 2012
- Type: Text , Journal article
- Relation: Pattern Recognition Vol. 45, no. 4 (2012), p. 1707-1720
- Full Text: false
- Reviewed:
- Description: This paper presents a novel ranking framework for content-based multimedia information retrieval (CBMIR). The framework introduces relevance features and a new ranking scheme. Each relevance feature measures the relevance of an instance with respect to a profile of the targeted multimedia database. We show that the task of CBMIR can be done more effectively using the relevance features than the original features. Furthermore, additional performance gain is achieved by incorporating our new ranking scheme which modifies instance rankings based on the weighted average of relevance feature values. Experiments on image and music databases validate the efficacy and efficiency of the proposed framework.
Smart phone based machine condition monitoring system
- Authors: Gondal, Iqbal , Yaqub, Muhammad , Hua, Xueliang
- Date: 2012
- Type: Text , Conference paper
- Relation: 19th International Conference on Neural Information Processing p. 488-497
- Full Text: false
- Reviewed:
- Description: Machine condition monitoring has gained momentum over the years and becoming an essential component in the today’s industrial units. A cost effective machine condition monitoring system is need of the hour for predictive maintenance. In this paper, we have developed a machine condition monitoring system using smart phone, thanks to the rapidly growing smart-phone market both in scalability and computational power. In spite of certain hardware limitations, this paper proposes a machine condition monitoring system which has the tendency to acquire data, build the fault diagnostic model and determine the type of the fault in the case of unknown fault signatures. Results for the fault detection accuracy are presented which validate the prospects of the proposed framework in future condition monitoring services.
Sustaining the future through virtual worlds
- Authors: Gregory, Sue , Gregory, Brent , Hillier, Mathew , Miller, Charlynn , Meredith, Grant
- Date: 2012
- Type: Text , Conference paper
- Relation: Future Challenges, Sustainable Futures p. 361-368
- Full Text:
- Reviewed:
- Description: Virtual worlds (VWs) continue to be used extensively in Australia and New Zealand higher education institutions although the tendency towards making unrealistic claims of efficacy and popularity appears to be over. Some educators at higher education institutions continue to use VWs in the same way as they have done in the past; others are exploring a range of different VWs or using them in new ways; whilst some are opting out altogether. This paper presents an overview of how 46 educators from some 26 institutions see VWs as an opportunity to sustain higher education. The positives and negatives of using VWs are discussed.
Unitary anomaly detection for ubiquitous safety in machine health monitoring
- Authors: Amar, Muhammad , Gondal, Iqbal , Wilson, Campbell
- Date: 2012
- Type: Text , Conference paper
- Relation: 19th International Conference on Neural Information Processing (INCONIP) p. 361-368
- Full Text: false
- Reviewed:
- Description: Safety has always been of vital concern in both industrial and home applications. Ensuring safety often requires certain quantifications regarding the inclusive behavior of the system under observation in order to determine deviations from normal behavior. In machine health monitoring, the vibration signal is of great importance for such measurements because it includes abundant information from several machine parts and surroundings that can influence machine behavior. This paper proposes a unitary anomaly detection technique (UAD) that, upon observation of abnormal behavior in the vibration signal, can trigger an alarm with an adjustable threshold in order to meet different safety requirements. The normalized amplitude of spectral contents of the quasi stationary time vibration signal are divided into frequency bins, and the summed amplitudes frequencies over bin are used as features. From a training set consisting of normal vibration signals, Gaussian distribution models are obtained for each feature, which are then used for anomaly detection.
A basic theory of intelligent finance
- Authors: Pan, Heping
- Date: 2011
- Type: Text , Journal article
- Relation: New Mathematics and Natural Computation Vol. 7, no. 2 (May 2011), p. 197-227
- Full Text: false
- Reviewed:
- Description: This paper presents a basic theory of intelligent finance as a new paradigm of financial investment. It is assumed that the financial market is always in a state of swing between efficient and inefficient modes on multiple levels of time scale; it is possible to go beyond the efficient market theory to study the dynamic evolving process of the market between equilibrium and far-from-equilibrium; there are robust dynamic patterns in this evolving process, which may be exploitable via intelligent trading systems. On the foundation of the four principles - comprehensive, predictive, dynamic and strategic, the basic theory takes the information sources into the loop as the starting points for all the market analysis, introducing the scale space of time into the pricing process analysis in order to detect and capture trends, cycles and seasonality on multiple intrinsic levels of time scale which are then used as the dynamic basis for constructing and managing portfolios. In stock markets, the theory exhibits itself in the form of an Intelligent Dynamic Portfolio Theory, which integrates predictive modeling of a bullbear market cycle, sector rotation, and portfolio optimization with a reactive trend following trading strategy.
A parametric approach to list decoding of Reed-Solomon codes using interpolation
- Authors: Ali, Mortuza , Kiujper, Margreta
- Date: 2011
- Type: Text , Journal article
- Relation: IEEE Transaction on Information Theory Vol. 57, no. 10 (2011), p. 6718-6728
- Full Text: false
- Reviewed:
- Description: Abstract—In this paper, we present a minimal list decoding algorithm for Reed-Solomon (RS) codes. Minimal list decoding for a code refers to list decoding with radius , where is the minimum of the distances between the received word and any codeword in . We consider the problem of determining the value of as well as determining all the codewords at distance . Our approach involves a parametrization of interpolating polynomials of a minimal Gröbner basis . We present two efficient ways to compute . We also show that so-called re-encoding can be used to further reduce the complexity. We then demonstrate how our parametric approach can be solved by a computationally feasible rational curve fitting solution from a recent paper by Wu. Besides, we present an algorithm to compute the minimum multiplicity as well as the optimal values of the parameters associated with this multiplicity, which results in overall savings in both memory and computation
An improved method to infer gene regulatory network using s-system
- Authors: Chowdhury, Ahsan , Chetty, Madhu
- Date: 2011
- Type: Text , Conference paper
- Relation: IEEE Congress on Evolutionary Computation (IEEE CEC) p. 1012-1019
- Full Text: false
- Reviewed:
- Description: Abstract—Gene Regulatory Network (GRN) plays an important role in the understanding of complex biological systems. In most cases, high throughput microarray gene expression data is used for finding these regulatory relationships among genes. In this paper, we present a novel approach, based on decoupled SSystem model, for reverse engineering GRNs. In the proposed method, the genetic algorithm used for scoring the networks contains several useful features for accurate network inference, namely a Prediction Initialization (PI) algorithm to initialize the individuals, a Flip Operation (FO) for better mating of values and a restricted execution of Hill Climbing Local Search over few individuals. It also includes a novel refinement technique which utilizes the fit solutions of the genetic algorithm for optimizing sensitivity and specificity of the inferred network. Comparative studies and robustness analysis using standard benchmark data set show the superiority of the proposed method.
Application of soft computing to predict blast-induced ground vibration
- Authors: Khandelwal, Manoj , Kumar, Lalit , Yellishetty, Mohan
- Date: 2011
- Type: Text , Journal article
- Relation: Engineering with Computers Vol. 27, no. 2 (2011), p. 117-125
- Full Text: false
- Reviewed:
- Description: In this study, an attempt has been made to evaluate and predict the blast-induced ground vibration by incorporating explosive charge per delay and distance from the blast face to the monitoring point using artificial neural network (ANN) technique. A three-layer feed-forward back-propagation neural network with 2-5-1 architecture was trained and tested using 130 experimental and monitored blast records from the surface coal mines of Singareni Collieries Company Limited, Kothagudem, Andhra Pradesh, India. Twenty new blast data sets were used for the validation and comparison of the peak particle velocity (PPV) by ANN and conventional vibration predictors. Results were compared based on coefficient of determination and mean absolute error between monitored and predicted values of PPV. © 2009 Springer-Verlag London Limited.
Automatic image search based on improved feature descriptors and decision tree
- Authors: Hou, Jin , Chen, Zeng , Qin, Xue , Zhang, Dengsheng
- Date: 2011
- Type: Text , Journal article
- Relation: Integrated Computer-Aided Engineering Vol. 18, no. 2 (2011), p. 167-180
- Full Text: false
- Reviewed:
- Description: There has been a growing interest in implementing image search engine at the semantic level. However, most existing practical systems including popular commercial image search engines like Google and Yahoo! are either text-based or a simple hybrid of texts and visual features. This paper proposes a novel image search system based on automatic image annotation. We develop a technology which learns semantic image concepts from image contents and transforms unstructured images into textual documents, so that images are indexed and retrieved in the same way as textual documents. Existing database management systems can be used to effectively manage image contents, and image search can be as efficient as text search by transforming images into textual documents through machine learning. Experiments in both the Corel dataset and real Web dataset are performed to validate our system and the results are promising. This system suggests a new combination of texts and visual features in order to achieve a semantic image search, and is expected to become a re-ranking system to the existing image search result via the Internet.
Blast-induced ground vibration prediction using support vector machine
- Authors: Khandelwal, Manoj
- Date: 2011
- Type: Text , Journal article
- Relation: Engineering with Computers Vol. 27, no. 3 (2011), p. 193-200
- Full Text: false
- Reviewed:
- Description: Ground vibrations induced by blasting are one of the fundamental problems in the mining industry and may cause severe damage to structures and plants nearby. Therefore, a vibration control study plays an important role in the minimization of environmental effects of blasting in mines. In this paper, an attempt has been made to predict the peak particle velocity using support vector machine (SVM) by taking into consideration of maximum charge per delay and distance between blast face to monitoring point. To investigate the suitability of this approach, the predictions by SVM have been compared with conventional vibration predictor equations. Coefficient of determination (CoD) and mean absolute error were taken as a performance measure. © 2010 Springer-Verlag London Limited.
Building sparse support vector machines for multi-instance classification
- Authors: Fu, Zhouyu , Lu, Guojun , Ting, Kaiming , Zhang, Dengsheng
- Date: 2011
- Type: Text , Conference paper
- Relation: European Conference on Machine Learning Knowledge Discovery in Databases (ECML PKDD) p. 471-486
- Full Text: false
- Reviewed:
- Description: We propose a direct approach to learning sparse Support Vector Machine (SVM) prediction models for Multi-Instance (MI) classification. The proposed sparse SVM is based on a “label-mean” formulation of MI classification which takes the average of predictions of individual instances for bag-level prediction. This leads to a convex optimization problem, which is essential for the tractability of the optimization problem arising from the sparse SVM formulation we derived subsequently, as well as the validity of the optimization strategy we employed to solve it. Based on the “label-mean” formulation, we can build sparse SVM models for MI classification and explicitly control their sparsities by enforcing the maximum number of expansions allowed in the prediction function. An effective optimization strategy is adopted to solve the formulated sparse learning problem which involves the learning of both the classifier and the expansion vectors. Experimental results on benchmark data sets have demonstrated that the proposed approach is effective in building very sparse SVM models while achieving comparable performance to the state-of-the-art MI classifiers.
Classification through incremental max-min separability
- Authors: Bagirov, Adil , Ugon, Julien , Webb, Dean , Karasozen, Bulent
- Date: 2011
- Type: Text , Journal article
- Relation: Pattern Analysis and Applications Vol. 14, no. 2 (2011), p. 165-174
- Relation: http://purl.org/au-research/grants/arc/DP0666061
- Full Text: false
- Reviewed:
- Description: Piecewise linear functions can be used to approximate non-linear decision boundaries between pattern classes. Piecewise linear boundaries are known to provide efficient real-time classifiers. However, they require a long training time. Finding piecewise linear boundaries between sets is a difficult optimization problem. Most approaches use heuristics to avoid solving this problem, which may lead to suboptimal piecewise linear boundaries. In this paper, we propose an algorithm for globally training hyperplanes using an incremental approach. Such an approach allows one to find a near global minimizer of the classification error function and to compute as few hyperplanes as needed for separating sets. We apply this algorithm for solving supervised data classification problems and report the results of numerical experiments on real-world data sets. These results demonstrate that the new algorithm requires a reasonable training time and its test set accuracy is consistently good on most data sets compared with mainstream classifiers. © 2010 Springer-Verlag London Limited.
Clustering gene expression data using ant-based heuristics
- Authors: Tan, Swee , Ting, Kaiming , Teng, Shyh
- Date: 2011
- Type: Text , Conference paper
- Relation: IEEE Congress on Evolutionary Computation (IEEE CEC) 2011 p. 1-8
- Full Text: false
- Reviewed:
- Description: ABSTRACT We consider the problem of finding the clusters in novel datasets in which the number of clusters is not known a priori; and little or no additional information is available for users to adjust the parameters in a clustering algorithm. We address this problem using a stochastic algorithm named SATTA (Simplified Adaptive Time Dependent Transporter), which attempts to find clusters without requiring users to specify the number of clusters or adjust any parameters. SATTA is then compared with Expectation Maximization Clustering, which is also able to estimate the number clusters using the principle of maximum likelihood and find the underlying clusters without any human interventions. Our results on seven gene expression datasets show that SATTA significantly outperforms Expectation Maximization Clustering in terms of clustering accuracy and efficiency. We discuss the conceptual differences between SATTA and EMC, which suggests that SATTA is a more promising alternative approach than Expectation Maximization Clustering when little or no additional information is available for clustering novel datasets.
- Description: ABSTRACT We consider the problem of finding the clusters in novel datasets in which the number of clusters is not known a priori; and little or no additional information is available for users to adjust the parameters in a clustering algorithm. We address this problem using a stochastic algorithm named SATTA (Simplified Adaptive Time Dependent Transporter), which attempts to find clusters without requiring users to specify the number of clusters or adjust any parameters. SATTA is then compared with Expectation Maximization Clustering, which is also able to estimate the number clusters using the principle of maximum likelihood and find the underlying clusters without any human interventions. Our results on seven gene expression datasets show that SATTA significantly outperforms Expectation Maximization Clustering in terms of clustering accuracy and efficiency. We discuss the conceptual differences between SATTA and EMC, which suggests that SATTA is a more promising alternative approach than Expectation Maximization Clustering when little or no additional information is available for clustering novel datasets. [less] 0 BOOKMARKS · 54 VIEWS
Density estimation based on mass
- Authors: Ting, Kaiming , Washio, Takashi , Wells, Jonathan , Liu, Fei
- Date: 2011
- Type: Text , Conference paper
- Relation: 11th IEEE International Conference on Data Mining (ICDM 2011) p. 715-724
- Full Text: false
- Reviewed:
- Description: Density estimation is the ubiquitous base modelling mechanism employed for many tasks such as clustering, classification, anomaly detection and information retrieval. Commonly used density estimation methods such as kernel density estimator and k-nearest neighbour density estimator have high time and space complexities which render them inapplicable in problems with large data size and even a moderate number of dimensions. This weakness sets the fundamental limit in existing algorithms for all these tasks. We propose the first density estimation method which stretches this fundamental limit to an extent that dealing with millions of data can now be done easily and quickly. We analyze the error of the new estimation (from the true density) using a bias-variance analysis. We then perform an empirical evaluation of the proposed method by replacing existing density estimators with the new one in two current density-based algorithms, namely, DBSCAN and LOF. The results show that the new density estimation method significantly improves the runtime of DBSCAN and LOF, while maintaining or improving their task-specific performances in clustering and anomaly detection, respectively. The new method empowers these algorithms, currently limited to small data size only, to process very large databases - setting a new benchmark for what density-based algorithms can achieve.
Empirical evaluation methods for multiobjective reinforcement learning algorithms
- Authors: Vamplew, Peter , Dazeley, Richard , Berry, Adam , Issabekov, Rustam , Dekker, Evan
- Date: 2011
- Type: Text , Journal article
- Relation: Machine Learning Vol. 84, no. 1-2 (2011), p. 51-80
- Full Text: false
- Reviewed:
- Description: While a number of algorithms for multiobjective reinforcement learning have been proposed, and a small number of applications developed, there has been very little rigorous empirical evaluation of the performance and limitations of these algorithms. This paper proposes standard methods for such empirical evaluation, to act as a foundation for future comparative studies. Two classes of multiobjective reinforcement learning algorithms are identified, and appropriate evaluation metrics and methodologies are proposed for each class. A suite of benchmark problems with known Pareto fronts is described, and future extensions and implementations of this benchmark suite are discussed. The utility of the proposed evaluation methods are demonstrated via an empirical comparison of two example learning algorithms. © 2010 The Author(s).