Enhanced polyphonic music genre classification using high level features
- Authors: Arabi, Arash , Lu, Guojun
- Date: 2009
- Type: Text , Conference paper
- Relation: Proceedings of the 2009 IEEE International Conference on Signal and Image Processing Applications p. 1-6
- Full Text: false
- Reviewed:
- Description: The task of classifying the genre of polyphonic music signals is traditionally done using only low level features of the signal. In this paper high level features have been applied to improve the task of music genre classification. The use of statistical chord features and chord progression information in conjunction with low level features are proposed in this paper. The chord progression information is manifested in genre probability descriptors calculated using a pattern matching algorithm. Our proposed method provides an improvement of 12.4% in the classification results over a commonly compared technique.
A class centric feature and classifier ensemble selection approach for music genre classification
- Authors: Ariyaratne, Hasitha Bimsara , Zhang, Dengsheng , Lu, Guojun
- Date: 2012
- Type: Text , Conference paper
- Relation: Joint IAPR International Workshop SSPR & SPR 2012 p. 666-674
- Full Text: false
- Reviewed:
- Description: Music genre classification has attracted a lot of research interest due to the rapid growth of digital music. Despite the availability of a vast number of audio features and classification techniques, genre classification still remains a challenging task. In this work we propose a class centric feature and classifier ensemble selection method which deviates from the conventional practice of employing a single, or an ensemble of classifiers trained with a selected set of audio features. We adopt a binary decomposition technique to divide the multiclass problem into a set of binary problems which are then treated in a class specific manner. This differs from the traditional techniques which operate on the naive assumption that a specific set of features and/or classifiers can perform equally well in identifying all the classes. Experimental results obtained on a popular genre dataset and a newly created dataset suggest significant improvements over traditional techniques.
A comparative study on contour-based corner detectors
- Authors: Awrangjeb, Mohammad , Lu, Guojun , Fraser, Clive
- Date: 2010
- Type: Text , Conference paper
- Relation: Digital Image Computing: Techniques and Applications (DICTA), 2010 International Conference
- Full Text: false
- Reviewed:
- Description: Contour-based corner detectors directly or indirectly estimate a significance measure (e.g. curvature) on the points of a planar curve and select the curvature extrema points as corners. While an extensive number of contour-based corner detectors have been proposed over the last four decades, there is no comparative study of recently proposed promising detectors. This paper is an attempt to fill this gap. We present the general frame-work of the contour-based corner detection technique and discuss two major issues - curve smoothing and curvature estimation, which have major impacts on the corner detection performance. A number of promising detectors are compared using an automatic evaluation system on a common large dataset. It is observed that while the detectors using indirect curvature estimation techniques are more robust, the detectors using direct curvature estimation techniques are faster.
A fast corner detector based on the chord-to-point distance accumulation technique
- Authors: Awrangjeb, Mohammad , Lu, Guojun , Fraser, Clive , Ravanbakhsh, Mehdi
- Date: 2009
- Type: Text , Conference paper
- Relation: Digital Image Computing: Techniques and Applications, 2009. DICTA '09.
- Full Text: false
- Reviewed:
- Description: Abstract—The previously proposed contour-based multi-scale corner detector based on the chord-to-point distance accumulation (CPDA) technique has proved its superior robustness over many other single- and multi-scale detectors. However, the original CPDA detector is computationally expensive since it calculates the CPDA discrete curvature on each point of the curve. The proposed improvement obtains a set of probable candidate points before the CPDA curvature estimation. The CPDA curvature is estimated on these chosen candidate points only. Consequently, the improved CPDA detector becomes faster, while retaining a similar robustness to the original CPDA detector.
A triangulation-based technique for building boundary identification from point cloud data
- Authors: Awrangjeb, Mohammad , Lu, Guojun
- Date: 2016
- Type: Text , Conference proceedings , Conference paper
- Relation: 2015 International Conference on Image and Vision Computing New Zealand, IVCNZ 2015; Auckland, New Zealand; 23rd-24th November 2015 Vol. 2016-November, p. 1-6
- Full Text: false
- Reviewed:
- Description: Building boundary identification is an essential prerequisite in building outline generation from point cloud data. In this problem, boundary edges that constitute the building boundary are identified. The existing solutions to the identification of boundary edges from the input point set have one or more of the following problems: ineffective in finding appropriate edges in a concave shape, incapable of determining a 'hole' or 'concavity' inside the shape separately, dependant on additional information such as the scan direction that may be unavailable, and incompetent in determining the boundary of a point set from the boundaries of two or more subsets of the point set. This paper proposes a new solution to the identification of building boundary by using the maximum point-to-point distance in the input data. It properly detects the boundary edges for any type of shape and separately recognises holes, if any, inside the shape. The unique feature of the proposed solution is that it can identify the boundary of a point set from the boundaries of two or more subsets of the point set. It does not require any additional information other than the input point set. Experimental results show that the proposed solution can preserve details along the building boundary and offer high area-based completeness and quality, even in low density input data. © 2015 IEEE.
- Description: International Conference Image and Vision Computing New Zealand
Building roof plane extraction from LIDAR data
- Authors: Awrangjeb, Mohammad , Lu, Guojun
- Date: 2013
- Type: Text , Conference paper
- Relation: 2013 International Conference on Digital Image Computing: Techniques and Applications (DICTA)
- Full Text:
- Reviewed:
- Description: This paper presents a new segmentation technique to use LIDAR point cloud data for automatic extraction of building roof planes. The raw LIDAR points are first classified into two major groups: ground and non-ground points. The ground points are used to generate a 'building mask' in which the black areas represent the ground where there are no laser returns below a certain height. The non-ground points are segmented to extract the planar roof segments. First, the building mask is divided into small grid cells. The cells containing the black pixels are clustered such that each cluster represents an individual building or tree. Second, the non-ground points within a cluster are segmented based on their coplanarity and neighbourhood relations. Third, the planar segments are refined using a rule-based procedure that assigns the common points among the planar segments to the appropriate segments. Finally, another rule-based procedure is applied to remove tree planes which are generally small in size and randomly oriented. Experimental results on three Australian sites have shown that the proposed method offers high building detection and roof plane extraction rates.
A performance review of recent corner detectors
- Authors: Awrangjeb, Mohammad , Lu, Guojun
- Date: 2013
- Type: Text , Conference paper
- Relation: International Conference on Digital Image Computing: Techniques and Applications, 26 November 2013 to 28 November 2013 p. 157-164
- Full Text:
- Reviewed:
- Description: Contour-based corner detectors directly or indirectly estimate a significance measure (eg, curvature) on the points of a planar curve and select the curvature extrema points as corners. A number of promising contour-based corner detectors have recently been proposed. They mainly differ in how the curvature is estimated on each point of the given curve. As the curvature on a digital curve can only be approximated, it is important to estimate a curvature that remains stable against significant noises, for example, geometric transformations and compression, on the curve. Moreover, in many applications, for instance, in content-based image retrieval, a fast corner detector is a prerequisite. So, it is also a primary characteristic that how much time a corner detector takes for corner detection in a given image. In addition, different authors evaluated their detectors on different platforms using different evaluation systems. Evaluation systems that depend on human judgements and visual identification of corners are manual and too subjective. Application of a manual system on a large test database will be expensive. Therefore, it is important to evaluate the detectors on a common platform using an automatic evaluation system. This paper first reviews six most recent and highly performed corner detectors and analyse their theoretical running time. Then it uses an automatic evaluation system to analyse their performance. Both the robustness to noise and efficiency are estimated to rank the detectors.
Integration of LIDAR data and orthoimage for automatic 3D building roof plane extraction
- Authors: Awrangjeb, Mohammad , Fraser, Clive , Lu, Guojun
- Date: 2013
- Type: Text , Conference paper
- Relation: 2013 IEEE International Conference on Multimedia and Expo (ICME)
- Full Text:
- Reviewed:
- Description: Automatic 3D extraction of building roofs from remotely sensed data is important for many applications including city modeling. This paper proposes a new method for automatic 3D roof extraction through an effective integration of LIDAR (Light Detection And Ranging) data and multispectral orthoimagery. Using the ground height from a DEM (Digital Elevation Model), the raw LIDAR points are separated into two groups. The first group contains the ground points that are exploited to constitute a `ground mask'. The second group contains the non-ground points which are segmented using an innovative image line guided segmentation technique to extract the roof planes. The image lines extracted from the grey-scale version of the orthoimage are classified into several classes such as `ground', `tree', `roof edge' and `roof ridge' using the ground mask and colour and texture information from the orthoimagery. During roof plane extraction the lines from the later two classes are used to fit roof planes to the neighbouring non-ground LIDAR points. Finally, a new rule-based procedure is applied to remove planes constructed on trees. Experimental results show that the proposed method successfully removes vegetation and offers high extraction rates.
A fast corner detector based on the chord-to-point distance accumulation technique
- Authors: Awrangjeb, Mohammad , Lu, Guojun , Fraser, Clive , Ravanbakhsh, Mehdi
- Date: 2009
- Type: Text , Conference paper
- Relation: 2009 Digital Image Computing Techniques and Applications (DICTA 2009) p. 519-525
- Full Text: false
- Reviewed:
Music emotion annotation by machine learning
- Authors: Cheung, Wai , Lu, Guojun
- Date: 2008
- Type: Text , Conference paper
- Relation: Proceedings of the 2008 IEEE 10th Workshop on Multimedia Signal Processing p. 580-585
- Full Text: false
- Reviewed:
- Description: Music emotion annotation is a task of attaching emotional terms to musical works. As volume of online musical contents expands rapidly in recent years, demands for retrieval by emotion are emerging. Currently, literature on music retrieval using emotional terms is rare. Emotion annotated data are scarce in existing music databases because annotation is still a manual task. Automating music emotion annotation is an essential prerequisite to research in music retrieval by emotion, for without which even sophisticated retrieval methods may not be very useful in a data deficient environment. This paper describes a machine learning approach to annotate music using a large number of emotional terms. We also estimate the training data size requirements for a workable annotation system. Our empirical result shows that 1) the task of music emotion annotation could be modelled using machine learning techniques to support a large number of emotional terms, 2) the combination of sampling method and data-driven detection threshold is highly effective in optimizing the use of existing annotated data in training machine learning models, 3) synonymous relationships enhance the annotation performance and 4) the training data size requirement is within reach for a workable annotation system. Essentially, automatic music emotion annotation enables music retrieval by emotion to be performed as a text retrieval task.
On low-rank regularized least squares for scalable nonlinear classification
- Authors: Fu, Zhouyu , Lu, Guojun , Ting, Kaiming , Zhang, Dengsheng
- Date: 2011
- Type: Text , Conference paper
- Relation: International Conference on Neural Information Processing p. 490-499
- Full Text: false
- Reviewed:
- Description: In this paper, we revisited the classical technique of Regularized Least Squares (RLS) for the classification of large-scale nonlinear data. Specifically, we focus on a low-rank formulation of RLS and show that it has linear time complexity in the data size only and does not rely on the number of labels and features for problems with moderate feature dimension. This makes low-rank RLS particularly suitable for classification with large data sets. Moreover, we have proposed a general theorem for the closed-form solutions to the Leave-One-Out Cross Validation (LOOCV) estimation problem in empirical risk minimization which encompasses all types of RLS classifiers as special cases. This eliminates the reliance on cross validation, a computationally expensive process for parameter selection, and greatly accelerate the training process of RLS classifiers. Experimental results on real and synthetic large-scale benchmark data sets have shown that low-rank RLS achieves comparable classification performance while being much more efficient than standard kernel SVM for nonlinear classification. The improvement in efficiency is more evident for data sets with higher dimensions.
Learning naive Bayes classifiers for music classification and retrieval
- Authors: Fu, Zhouyu , Lu, Guojun , Ting, Kaiming , Zhang, Dengsheng
- Date: 2010
- Type: Text , Conference paper
- Relation: Proceedings of the 20th International Conference on Pattern Recognition p. 4589-4592
- Full Text: false
- Reviewed:
- Description: In this paper, we explore the use of naive Bayes classifiers for music classification and retrieval. The motivation is to employ all audio features extracted from local windows for classification instead of just using a single song-level feature vector produced by compressing the local features. Two variants of naive Bayes classifiers are studied based on the extensions of standard nearest neighbor and support vector machine classifiers. Experimental results have demonstrated superior performance achieved by the proposed naive Bayes classifiers for both music classification and retrieval as compared to the alternative methods.
Building sparse support vector machines for multi-instance classification
- Authors: Fu, Zhouyu , Lu, Guojun , Ting, Kaiming , Zhang, Dengsheng
- Date: 2011
- Type: Text , Conference paper
- Relation: European Conference on Machine Learning Knowledge Discovery in Databases (ECML PKDD) p. 471-486
- Full Text: false
- Reviewed:
- Description: We propose a direct approach to learning sparse Support Vector Machine (SVM) prediction models for Multi-Instance (MI) classification. The proposed sparse SVM is based on a “label-mean” formulation of MI classification which takes the average of predictions of individual instances for bag-level prediction. This leads to a convex optimization problem, which is essential for the tractability of the optimization problem arising from the sparse SVM formulation we derived subsequently, as well as the validity of the optimization strategy we employed to solve it. Based on the “label-mean” formulation, we can build sparse SVM models for MI classification and explicitly control their sparsities by enforcing the maximum number of expansions allowed in the prediction function. An effective optimization strategy is adopted to solve the formulated sparse learning problem which involves the learning of both the classifier and the expansion vectors. Experimental results on benchmark data sets have demonstrated that the proposed approach is effective in building very sparse SVM models while achieving comparable performance to the state-of-the-art MI classifiers.
Optimizing cepstral features for audio classification
- Authors: Fu, Zhouyu , Lu, Guojun , Ting, Kaiming , Zhang, Dengsheng
- Date: 2013
- Type: Text , Conference paper
- Relation: International Joint Conference on Artificial Intelligence p. 1330-1336
- Full Text: false
- Reviewed:
- Description: Cepstral features have been widely used in audio applications. Domain knowledge has played an important role in designing different types of cepstral features proposed in the literature. In this paper, we present a novel approach for learning optimized cepstral features directly from audio data to better discriminate between different categories of signals in classification tasks. We employ multi-layer feedforward neural networks to model the cepstral feature extraction process. The network weights are initialized to replicate a reference cepstral feature like the mel frequency cepstral coefficient. We then propose a embedded approach that integrates feature learning with the training of a support vector machine (SVM) classifier. A single optimization problem is formulated where the feature and classifier variables are optimized simultaneously so as to refine the initial features and minimize the classification risk. Experimental results have demonstrated the effectiveness of the proposed feature learning approach, outperforming competing methods by a large margin on benchmark data.
Learning sparse kernel classifiers in the primal
- Authors: Fu, Zhouyu , Lu, Guojun , Ting, Kaiming , Zhang, Dengsheng
- Date: 2012
- Type: Text , Conference paper
- Relation: Joint IAPR International Workshop, SSPR&SPR 2012; Hiroshima, Japan; 7th-9th November 2012; published in Structural, Syntactic, and Statistical Pattern Recognition (part of the Lecture Notes in Computer Science) Vol. 7626, p. 60-69
- Full Text: false
- Reviewed:
- Description: The increasing number of classification applications in large data sets demands that efficient classifiers be designed not only in training but also for prediction. In this paper, we address the problem of learning kernel classifiers with reduced complexity and improved efficiency for prediction in comparison to those trained by standard methods. A single optimisation problem is formulated for classifier learning which optimises both classifier weights and eXpansion Vectors (XVs) that define the classification function in a joint fashion. Unlike the existing approach of Wu et al, which performs optimisation in the dual formulation, our approach solves the primal problem directly. The primal problem is much more efficient to solve, as it can be converted to the training of a linear classifier in each iteration, which scales linearly to the size of the data set and the number of expansions. This makes our primal approach highly desirable for large-scale applications, where the dual approach is inadequate and prohibitively slow due to the solution of cubic-time kernel SVM involved in each iteration. Experimental results have demonstrated the efficiency and effectiveness of the proposed primal approach for learning sparse kernel classifiers that clearly outperform the alternatives.
Robust building roof segmentation using airborne point cloud data
- Authors: Gilani, Syed , Awrangjeb, Mohammad , Lu, Guojun
- Date: 2016
- Type: Text , Conference proceedings , Conference paper
- Relation: 23rd IEEE International Conference on Image Processing, ICIP 2016; Phoenix, United States; 25th-28th September 2016; published in Proceedings - International Conferenec on Image Processing, ICIP Vol. 2016-August, p. 859-863
- Full Text: false
- Reviewed:
- Description: Approximation of the geometric features is an essential step in point cloud segmentation and surface reconstruction. Often, the planar surfaces are estimated using principal component analysis (PCA), which is sensitive to noise and smooths the sharp features. Hence, the segmentation results into unreliable reconstructed surfaces. This article presents a point cloud segmentation method for building detection and roof plane extraction. It uses PCA for saliency feature estimation including surface curvature and point normal. However, the point normals around the anisotropic surfaces are approximated using a consistent isotropic sub-neighbourhood by Low-Rank Subspace with prior Knowledge (LRSCPK). The developed segmentation technique is tested using two real-world samples and two benchmark datasets. Per-object and per-area completeness and correctness results indicate the robustness of the approach and the quality of the reconstructed surfaces and extracted buildings. © 2016 IEEE.
- Description: Proceedings - International Conference on Image Processing, ICIP
Enhancing the effectiveness of local descriptor based image matching
- Authors: Hossain, Md Tahmid , Teng, Shyh , Zhang, Dengsheng , Lim, Suryani , Lu, Guojun
- Date: 2018
- Type: Text , Conference proceedings , Conference paper
- Relation: 2018 International Conference on Digital Image Computing: Techniques and Applications, DICTA 2018; Canberra, Australia; 10th-13th December 2018 p. 1-8
- Full Text: false
- Reviewed:
- Description: Image registration has received great attention from researchers over the last few decades. SIFT (Scale Invariant Feature Transform), a local descriptor-based technique is widely used for registering and matching images. To establish correspondences between images, SIFT uses a Euclidean Distance ratio metric. However, this approach leads to a lot of incorrect matches and eliminating these inaccurate matches has been a challenge. Various methods have been proposed attempting to mitigate this problem. In this paper, we propose a scale and orientation harmony-based pruning method that improves image matching process by successfully eliminating incorrect SIFT descriptor matches. Moreover, our technique can predict the image transformation parameters based on a novel adaptive clustering method with much higher matching accuracy. Our experimental results have shown that the proposed method has achieved averages of approximately 16% and 10% higher matching accuracy compared to the traditional SIFT and a contemporary method respectively.
- Description: 2018 International Conference on Digital Image Computing: Techniques and Applications, DICTA 2018
Achieving high multi-modal registration performance using simplified Hough-transform with improved symmetric-SIFT
- Authors: Hossain, Md Tanvir , Teng, Shyh , Lu, Guojun
- Date: 2012
- Type: Text , Conference paper
- Relation: 14th International Conference on Digital Image Computing Techniques and Applications, DICTA 2012
- Full Text: false
- Reviewed:
- Description: The traditional way of using Hough Transform with SIFT is for the purpose of reliable object recognition. However, it cannot be effectively applied to image registration in the same way as the recall rate can be significantly lower. In this paper, we propose an alternative implementation of Hough Transform that can be used with Improved Symmetric-SIFT for multi-modal image registration. Our experimental results show that the proposed technique of applying Hough Transform can significantly improve the key-point matching as well as registration accuracy by utilizing aggregated information from key-points throughout the input images.
Improved symmetric-SIFT for Multi-modal image registration
- Authors: Hossain, Md. Tanvir , Lv, Guohua , Teng, Shyh , Lu, Guojun , Lackmann, Martin
- Date: 2011
- Type: Text , Conference paper
- Relation: 2011 International Conference on Digital Image Computing: Techniques and Applications p. 197-202
- Full Text: false
- Reviewed:
- Description: Multi-modal image registration has received significant research attention over the past decade. SymmetricSIFT is a recently proposed local description technique that can be used for registering multi-modal images. It is based on a well-known general image registration technique named Scale Invariant Feature Transform (SIFT). Symmetric-SIFT, however, achieves this invariance to multi-modality at the cost of losing important information. In this paper, we show how this loss may adversely affect the accuracy of registration results. We then propose an improvement to Symmetric-SIFT to overcome the problem. Our experimental results show that the proposed technique can improve the number of true matches by up to 10 times and overall matching accuracy by up to 30%.
An enhancement to SIFT-based techniques for image registration
- Authors: Hossain, Tanvir , Teng, Shyh , Lu, Guojun , Lackmann, Martin
- Date: 2010
- Type: Text , Conference paper
- Relation: Proceedings of the 2010 Digital Image Computing: Techniques and Applications p. 166-171
- Full Text: false
- Reviewed:
- Description: Symmetric-SIFT is a recently proposed local technique used for registering multimodal images. It is based on a well-known general image registration technique named Scale Invariant Feature Transform (SIFT). Symmetric SIFT makes use of the gradient magnitude information at the image's key regions to build the descriptors. In this paper, we highlight an issue with how the magnitude information is used in this process. This issue may result in similar descriptors being built to represent regions in images that are visually different. To address this issue, we have proposed two new strategies for weighting the descriptors. Our experimental results show that Symmetric-SIFT descriptors built using our proposed strategies can lead to better registration accuracy than descriptors built using the original Symmetric-SIFT technique. The issue highlighted and the two strategies proposed are also applicable to the general SIFT technique.