Learning sparse kernel classifiers for multi-instance classification
- Authors: Fu, Zhouyu , Lu, Guojun , Ting, Kaiming , Zhang, Dengsheng
- Date: 2013
- Type: Text , Journal article
- Relation: IEEE Transactions on Neural Networks and Learning Systems Vol. 24, no. 9 (2013), p. 1377-1389
- Full Text: false
- Reviewed:
- Description: We propose a direct approach to learning sparse kernel classifiers for multi-instance (MI) classification to improve efficiency while maintaining predictive accuracy. The proposed method builds on a convex formulation for MI classification by considering the average score of individual instances for bag-level prediction. In contrast, existing formulations used the maximum score of individual instances in each bag, which leads to nonconvex optimization problems. Based on the convex MI framework, we formulate a sparse kernel learning algorithm by imposing additional constraints on the objective function to enforce the maximum number of expansions allowed in the prediction function. The formulated sparse learning problem for the MI classification is convex with respect to the classifier weights. Therefore, we can employ an effective optimization strategy to solve the optimization problem that involves the joint learning of both the classifier and the expansion vectors. In addition, the proposed formulation can explicitly control the complexity of the prediction model while still maintaining competitive predictive performance. Experimental results on benchmark data sets demonstrate that our proposed approach is effective in building very sparse kernel classifiers while achieving comparable performance to the state-of-the-art MI classifiers.
Optimizing cepstral features for audio classification
- Authors: Fu, Zhouyu , Lu, Guojun , Ting, Kaiming , Zhang, Dengsheng
- Date: 2013
- Type: Text , Conference paper
- Relation: International Joint Conference on Artificial Intelligence p. 1330-1336
- Full Text: false
- Reviewed:
- Description: Cepstral features have been widely used in audio applications. Domain knowledge has played an important role in designing different types of cepstral features proposed in the literature. In this paper, we present a novel approach for learning optimized cepstral features directly from audio data to better discriminate between different categories of signals in classification tasks. We employ multi-layer feedforward neural networks to model the cepstral feature extraction process. The network weights are initialized to replicate a reference cepstral feature like the mel frequency cepstral coefficient. We then propose a embedded approach that integrates feature learning with the training of a support vector machine (SVM) classifier. A single optimization problem is formulated where the feature and classifier variables are optimized simultaneously so as to refine the initial features and minimize the classification risk. Experimental results have demonstrated the effectiveness of the proposed feature learning approach, outperforming competing methods by a large margin on benchmark data.
Structural image retrieval using automatic image annotation and region based inverted file
- Authors: Zhang, Dengsheng , Islam, Md , Lu, Guojun
- Date: 2013
- Type: Text , Journal article
- Relation: Journal of Visual Communication and Image Representation Vol. 24, no. 7 (2013), p. 1087-1098
- Full Text: false
- Reviewed:
- Description: Image retrieval has lagged far behind text retrieval despite more than two decades of intensive research effort. Most of the research on image retrieval in the last two decades are on content based image retrieval or image retrieval based on low level features. Recent research in this area focuses on semantic image retrieval using automatic image annotation. Most semantic image retrieval techniques in literature, however, treat an image as a bag of features/words while ignore the structural or spatial information in the image. In this paper, we propose a structural image retrieval method based on automatic image annotation and region based inverted file. In the proposed system, regions in an image are treated the same way as keywords in a structural text document, semantic concepts are learnt from image data to label image regions as keywords and weight is assigned to each keyword according to spatial position and relationship. As the result, images are indexed and retrieved in the same way as structural document retrieval. Specifically, images are broken down to regions which are represented using colour, texture and shape features. Region features are then quantized to create visual dictionaries which are similar to monolingual dictionaries like English or Chinese dictionaries. In the next step, a semantic dictionary similar to a bilingual dictionary like the English–Chinese dictionary is learnt to mapping image regions to semantic concepts. Finally, images are then indexed and retrieved using a novel region based inverted file data structure. Results show the proposed method has significant advantage over the widely used Bayesian annotation models.
A class centric feature and classifier ensemble selection approach for music genre classification
- Authors: Ariyaratne, Hasitha Bimsara , Zhang, Dengsheng , Lu, Guojun
- Date: 2012
- Type: Text , Conference paper
- Relation: Joint IAPR International Workshop SSPR & SPR 2012 p. 666-674
- Full Text: false
- Reviewed:
- Description: Music genre classification has attracted a lot of research interest due to the rapid growth of digital music. Despite the availability of a vast number of audio features and classification techniques, genre classification still remains a challenging task. In this work we propose a class centric feature and classifier ensemble selection method which deviates from the conventional practice of employing a single, or an ensemble of classifiers trained with a selected set of audio features. We adopt a binary decomposition technique to divide the multiclass problem into a set of binary problems which are then treated in a class specific manner. This differs from the traditional techniques which operate on the naive assumption that a specific set of features and/or classifiers can perform equally well in identifying all the classes. Experimental results obtained on a popular genre dataset and a newly created dataset suggest significant improvements over traditional techniques.
A novel automatic hierachical approach to music genre classification
- Authors: Ariyaratne, Hasitha , Zhang, Dengsheng
- Date: 2012
- Type: Text , Conference paper
- Relation: 2012 IEEE International Conference on Multimedia and Expo Workshops (ICMEW)
- Full Text: false
- Reviewed:
- Description: Automatic music genre classification is an important component in Music Information Retrieval (MIR). It has gained lot of attention lately due to the rapid growth in the use of digital music. Past work in this area has already produced a number of audio features and classification techniques; however, genre classification still remains an unsolved problem. In this paper we explore a hybrid unsupervised/supervised top-down hierarchical classification approach. Most existing work on hierarchical music genre classification relies on human built trees and taxonomies; however these hierarchies may not always translate well into machine classification problems. Therefore, we explore an automatic approach to construct a classification tree through subspace cluster analysis. Experimental results validate the tree building algorithm and provide a new research direction for automatic genre classification. We also addressed the issue of scarcity in publicly available music datasets, by introducing a new dataset containing genre, artist and album labels.
A review on automatic image annotation techniques
- Authors: Zhang, Dengsheng , Islam, Md , Lu, Guojun
- Date: 2012
- Type: Text , Journal article
- Relation: Pattern Recognition Letters Vol. 45, no. 1 (2012), p. 346-362
- Full Text: false
- Reviewed:
- Description: Nowadays, more and more images are available. However, to find a required image for an ordinary user is a challenging task. Large amount of researches on image retrieval have been carried out in the past two decades. Traditionally, research in this area focuses on content based image retrieval. However, recent research shows that there is a semantic gap between content based image retrieval and image semantics understandable by humans. As a result, research in this area has shifted to bridge the semantic gap between low level image features and high level semantics. The typical method of bridging the semantic gap is through the automatic image annotation (AIA) which extracts semantic features using machine learning techniques. In this paper, we focus on this latest development in image retrieval and provide a comprehensive survey on automatic image annotation. We analyse key aspects of the various AIA methods, including both feature extraction and semantic learning methods. Major methods are discussed and illustrated in details. We report our findings and provide future research directions in the AIA area in the conclusions
An annotation rule extraction algorithm for image retrieval
- Authors: Chen, Zeng , Hou, Jin , Zhang, Dengsheng , Qin, Xue
- Date: 2012
- Type: Text , Journal article
- Relation: Pattern Recognition Letters Vol. 33, no. 10 (2012), p.1257-1268
- Full Text: false
- Reviewed:
- Description: Automatic image annotation can be used to facilitate semantic search in large image databases. However, retrieval performance of the existing annotation schemes is far from the users’ expectation. In this paper, we propose a novel method to automatically annotate image through the rules generated by support vector machines and decision trees. In order to obtain the rules, we collect a set of training regions by image segmentation, feature extraction and discretization. We first employ a support vector machine as a preprocessing technique to refine the input training data and then use it to improve the rules generated by decision tree learning. The preprocessing can effectively deal with the similar regions in an image as well. Moreover, we integrate the original rules to the modified ones, so as to formulate the complete and effective annotation rules. We can translate an unknown image into text by this algorithm, and the proposed system can retrieve images queried by both images and keywords. Experiments are carried out in a standard Corel dataset and images collected from the Web to test the accuracy and robustness of the proposed system. Experimental results show the proposed algorithm can annotate and retrieve images more efficiently than traditional learning algorithms.
Comparison of curvelet and wavelet texture features for content based image retrieval
- Authors: Sumana, Ishrat , Lu, Guojun , Zhang, Dengsheng
- Date: 2012
- Type: Text , Conference paper
- Relation: 2012 IEEE International Conference on Multimedia and Expo (ICME) p. 290-295
- Full Text: false
- Reviewed:
- Description: Texture feature plays a vital role in content based Image retrieval (CBIR). Wavelet texture feature modeled by generalized Gaussian density (GGD) [1] performs better than discrete wavelet texture feature. Curve let texture feature was proposed in [2]. In this paper, we compute a new texture feature by applying the generalized Gaussian density to the distribution of curve let coefficients which we call curve let GGD texture feature. The purpose of this paper is to investigate curve let GGD texture feature and compare its retrieval performance with that of curve let, wavelet and wavelet GGD texture features. Experimental results show that both curve let and curve let GGD features perform significantly better than wavelet and wavelet GGD texture features. Among the two types of curve let based features, curve let feature shows better performance in CBIR than curve let GGD texture feature. The findings are discussed in the paper.
Learning sparse kernel classifiers in the primal
- Authors: Fu, Zhouyu , Lu, Guojun , Ting, Kaiming , Zhang, Dengsheng
- Date: 2012
- Type: Text , Conference paper
- Relation: Joint IAPR International Workshop, SSPR&SPR 2012; Hiroshima, Japan; 7th-9th November 2012; published in Structural, Syntactic, and Statistical Pattern Recognition (part of the Lecture Notes in Computer Science) Vol. 7626, p. 60-69
- Full Text: false
- Reviewed:
- Description: The increasing number of classification applications in large data sets demands that efficient classifiers be designed not only in training but also for prediction. In this paper, we address the problem of learning kernel classifiers with reduced complexity and improved efficiency for prediction in comparison to those trained by standard methods. A single optimisation problem is formulated for classifier learning which optimises both classifier weights and eXpansion Vectors (XVs) that define the classification function in a joint fashion. Unlike the existing approach of Wu et al, which performs optimisation in the dual formulation, our approach solves the primal problem directly. The primal problem is much more efficient to solve, as it can be converted to the training of a linear classifier in each iteration, which scales linearly to the size of the data set and the number of expansions. This makes our primal approach highly desirable for large-scale applications, where the dual approach is inadequate and prohibitively slow due to the solution of cubic-time kernel SVM involved in each iteration. Experimental results have demonstrated the efficiency and effectiveness of the proposed primal approach for learning sparse kernel classifiers that clearly outperform the alternatives.
A survey of audio-based music classification and annotation
- Authors: Fu, Zhouyu , Lu, Guojun , Ting, Kaiming , Zhang, Dengsheng
- Date: 2011
- Type: Text , Journal article
- Relation: IEEE Transactions on Multimedia Vol. 13, no. 2 (2011), p. 303-319
- Full Text: false
- Reviewed:
- Description: Music information retrieval (MIR) is an emerging research area that receives growing attention from both the research community and music industry. It addresses the problem of querying and retrieving certain types of music from large music data set. Classification is a fundamental problem in MIR. Many tasks in MIR can be naturally cast in a classification setting, such as genre classification, mood classification, artist recognition, instrument recognition, etc. Music annotation, a new research area in MIR that has attracted much attention in recent years, is also a classification problem in the general sense. Due to the importance of music classification in MIR research, rapid development of new methods, and lack of review papers on recent progress of the field, we provide a comprehensive review on audio-based classification in this paper and systematically summarize the state-of-the-art techniques for music classification. Specifically, we have stressed the difference in the features and the types of classifiers used for different classification tasks. This survey emphasizes on recent development of the techniques and discusses several open issues for future research.
Automatic image search based on improved feature descriptors and decision tree
- Authors: Hou, Jin , Chen, Zeng , Qin, Xue , Zhang, Dengsheng
- Date: 2011
- Type: Text , Journal article
- Relation: Integrated Computer-Aided Engineering Vol. 18, no. 2 (2011), p. 167-180
- Full Text: false
- Reviewed:
- Description: There has been a growing interest in implementing image search engine at the semantic level. However, most existing practical systems including popular commercial image search engines like Google and Yahoo! are either text-based or a simple hybrid of texts and visual features. This paper proposes a novel image search system based on automatic image annotation. We develop a technology which learns semantic image concepts from image contents and transforms unstructured images into textual documents, so that images are indexed and retrieved in the same way as textual documents. Existing database management systems can be used to effectively manage image contents, and image search can be as efficient as text search by transforming images into textual documents through machine learning. Experiments in both the Corel dataset and real Web dataset are performed to validate our system and the results are promising. This system suggests a new combination of texts and visual features in order to achieve a semantic image search, and is expected to become a re-ranking system to the existing image search result via the Internet.
Building sparse support vector machines for multi-instance classification
- Authors: Fu, Zhouyu , Lu, Guojun , Ting, Kaiming , Zhang, Dengsheng
- Date: 2011
- Type: Text , Conference paper
- Relation: European Conference on Machine Learning Knowledge Discovery in Databases (ECML PKDD) p. 471-486
- Full Text: false
- Reviewed:
- Description: We propose a direct approach to learning sparse Support Vector Machine (SVM) prediction models for Multi-Instance (MI) classification. The proposed sparse SVM is based on a “label-mean” formulation of MI classification which takes the average of predictions of individual instances for bag-level prediction. This leads to a convex optimization problem, which is essential for the tractability of the optimization problem arising from the sparse SVM formulation we derived subsequently, as well as the validity of the optimization strategy we employed to solve it. Based on the “label-mean” formulation, we can build sparse SVM models for MI classification and explicitly control their sparsities by enforcing the maximum number of expansions allowed in the prediction function. An effective optimization strategy is adopted to solve the formulated sparse learning problem which involves the learning of both the classifier and the expansion vectors. Experimental results on benchmark data sets have demonstrated that the proposed approach is effective in building very sparse SVM models while achieving comparable performance to the state-of-the-art MI classifiers.
Music classification via the bag-of-features approach
- Authors: Fu, Zhouyu , Lu, Guojun , Ting, Kaiming , Zhang, Dengsheng
- Date: 2011
- Type: Text , Journal article
- Relation: Pattern Recognition Letters Vol. 32, no. 14 (2011), p. 1768-1777
- Full Text: false
- Reviewed:
- Description: A central problem in music information retrieval is audio-based music classification. Current music classification systems follow a frame-based analysis model. A whole song is split into frames, where a feature vector is extracted from each local frame. Each song can then be represented by a set of feature vectors. How to utilize the feature set for global song-level classification is an important problem in music classification. Previous studies have used summary features and probability models which are either overly restrictive in modeling power or numerically too difficult to solve. In this paper, we investigate the bag-of-features approach for music classification which can effectively aggregate the local features for song-level feature representation. Moreover, we have extended the standard bag-of-features approach by proposing a multiple codebook model to exploit the randomness in the generation of codebooks. Experimental results for genre classification and artist identification on benchmark data sets show that the proposed classification system is highly competitive against the standard methods.
On low-rank regularized least squares for scalable nonlinear classification
- Authors: Fu, Zhouyu , Lu, Guojun , Ting, Kaiming , Zhang, Dengsheng
- Date: 2011
- Type: Text , Conference paper
- Relation: International Conference on Neural Information Processing p. 490-499
- Full Text: false
- Reviewed:
- Description: In this paper, we revisited the classical technique of Regularized Least Squares (RLS) for the classification of large-scale nonlinear data. Specifically, we focus on a low-rank formulation of RLS and show that it has linear time complexity in the data size only and does not rely on the number of labels and features for problems with moderate feature dimension. This makes low-rank RLS particularly suitable for classification with large data sets. Moreover, we have proposed a general theorem for the closed-form solutions to the Leave-One-Out Cross Validation (LOOCV) estimation problem in empirical risk minimization which encompasses all types of RLS classifiers as special cases. This eliminates the reliance on cross validation, a computationally expensive process for parameter selection, and greatly accelerate the training process of RLS classifiers. Experimental results on real and synthetic large-scale benchmark data sets have shown that low-rank RLS achieves comparable classification performance while being much more efficient than standard kernel SVM for nonlinear classification. The improvement in efficiency is more evident for data sets with higher dimensions.
Rotation invariant curvelet features for region based image retrieval
- Authors: Zhang, Dengsheng , Islam, Md , Lu, Guojun , Sumana, Ishrat
- Date: 2011
- Type: Text , Journal article
- Relation: International Journal of Computer Vision Vol. 98, no. 2 (2011), p. 187-201
- Full Text: false
- Reviewed:
- Description: There have been much interest and a large amount of research on content based image retrieval (CBIR) in recent years due to the ever increasing number of digital images. Texture features play a key role in CBIR. Many texture features exist in literature, however, most of them are neither rotation invariant nor robust to scale and other variations. Texture features based on Gabor filters have been shown with significant advantages over other methods, and they are adopted by MPEG-7 as one of the texture descriptors for image retrieval. In this paper, we propose a rotation invariant curvelet features for texture representation. With systematic analysis and rigorous experiments, we show that the proposed curvelet texture features significantly outperforms the widely used Gabor texture features. A novel region padding method is also proposed to apply curvelet transform to region based image retrieval. Retrieval results from standard image databases show that curvelet features are promising for both texture and region representation.
An enhancement to closed-form method for natural image matting
- Authors: Zhu, Jun , Zhang, Dengsheng , Lu, Guojun
- Date: 2010
- Type: Text , Conference paper
- Relation: Proceedings of the 2010 Digital Image Computing: Techniques and Applications p. 629-634
- Full Text: false
- Reviewed:
- Description: Natural image matting is a task to estimate fractional opacity of foreground layer from an image. Many matting methods have been proposed, and most of them are trimap-based. Among these methods, closed-form matting offers both trimap-based and scribble-based matting. However, the closed-form method causes significant errors at background-hole regions due to over-smoothing. In this paper, we identify the source of the problem and propose our solution to enhance the closed-form method. Experiments show that our enhanced method can improve the accuracy for trimap-based images and obtain similar result to the closed-form method for scribble-based matting.
Connectivity-based shape descriptors
- Authors: Sajjanhar, Atul , Lu, Guojun , Zhang, Dengsheng , Zhou, Wanle
- Date: 2010
- Type: Text , Journal article
- Relation: International Journal of Computers and Applications Vol. 32, no. 1 (2010), p. 93-98
- Full Text: false
- Reviewed:
- Description: In this paper, we propose a method for indexing and retrieval of images based on shapes of objects. The concept of connectivity is introduced. 3D models are used to represent 2D images. 2D images are decomposed a priori using connectivity which is followed by 3D model construction. 3D model descriptors are obtained for 3D models and used to represent the underlying 2D shapes. We have used spherical harmonics descriptors as the 3D model descriptors. Difference between two images is computed as the Euclidean distance between their descriptors. Experiments are performed to test the effectiveness of spherical harmonics for retrieval of 2D images. The proposed method is compared with methods based on principal components analysis (PCA) and generic Fourier descriptors (GFD). It is found that the proposed method is effective. Item S8 within the MPEG-7 still images content set is used for performing experiments.
Improved spatial pyramid matching for image classification
- Authors: Shahiduzzaman, Mohammad , Zhang, Dengsheng , Lu, Guojun
- Date: 2010
- Type: Text , Conference paper
- Relation: 10th Asian Conference on Computer Vision p. 449-459
- Full Text: false
- Reviewed:
- Description: Spatial analysis of salient feature points has been shown to be promising in image analysis and classification. In the past, spatial pyramid matching makes use of both of salient feature points and spatial multiresolution blocks to match between images. However, it is shown that different images or blocks can still have similar features using spatial pyramid matching. The analysis and matching will be more accurate in scale space. In this paper, we propose to do spatial pyramid matching in scale space. Specifically, pyramid match histograms are computed in multiple scales to refine the kernel for support vector machine classification. We show that the combination of salient point features, scale space and spatial pyramid matching improves the original spatial pyramid matching significantly.
Learning naive Bayes classifiers for music classification and retrieval
- Authors: Fu, Zhouyu , Lu, Guojun , Ting, Kaiming , Zhang, Dengsheng
- Date: 2010
- Type: Text , Conference paper
- Relation: Proceedings of the 20th International Conference on Pattern Recognition p. 4589-4592
- Full Text: false
- Reviewed:
- Description: In this paper, we explore the use of naive Bayes classifiers for music classification and retrieval. The motivation is to employ all audio features extracted from local windows for classification instead of just using a single song-level feature vector produced by compressing the local features. Two variants of naive Bayes classifiers are studied based on the extensions of standard nearest neighbor and support vector machine classifiers. Experimental results have demonstrated superior performance achieved by the proposed naive Bayes classifiers for both music classification and retrieval as compared to the alternative methods.
Novel spectral descriptor for object shape
- Authors: Sajjanhar, Atul , Lu, Guojun , Zhang, Dengsheng
- Date: 2010
- Type: Text , Book chapter
- Relation: Proceedings of the 11th Pacific Rim Conference on Multimedia: Advances in Multimedia Information Processing p. 58-67
- Full Text:
- Reviewed:
- Description: In this paper, we propose a novel descriptor for shapes. The proposed descriptor is obtained from 3D spherical harmonics. The inadequacy of 2D spherical harmonics is addressed and the method to obtain 3D spherical harmonics is described. 3D spherical harmonics requires construction of a 3D model which implicitly represents rich features of objects. Spherical harmonics are used to obtain descriptors from the 3D models. The performance of the proposed method is compared against the CSS approach which is the MPEG-7 descriptor for shape contour. MPEG-7 dataset of shape contours, namely, CE-1 is used to perform the experiments. It is shown that the proposed method is effective