A survey on image classification of lightweight convolutional neural network
- Authors: Liu, Ying , Xiao, Peng , Fang, Jie , Zhang, Dengsheng
- Date: 2023
- Type: Text , Conference paper
- Relation: 19th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery, ICNC-FSKD 2023, Harbin, China, 29-31 July 2023, 2023 19th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD)
- Full Text: false
- Reviewed:
- Description: In recent years, deep neural networks have achieved tremendous success in image classification in both academic and industrial settings. However, the high hardware requirements imposed by their intensive and complex computations pose a challenge for deployment on low-storage devices. To address this challenge, lightweight networks provide a viable solution. This paper provides a detailed review of recent lightweight image classification algorithms, which can be categorized into low-redundancy network model design and neural network compression algorithms. The former reduces network computations by replacing traditional convolution with efficient lightweight convolution, while the latter reduces redundancy in the network by employing methods such as network pruning, knowledge distillation, and parameter quantization. We summarize the experimental results of some classical models and algorithms on ImageNet2012 and CIFAR-10 datasets, and analyze the characteristics, advantages and disadvantages of these models respectively. Finally, future research directions for lightweight algorithms in the field of image classification are identified. © 2023 IEEE.
Fine-grained image classification based on knowledge distillation
- Authors: Liu, Ying , Feng, Hao , Zhang, Weidong , Fang, Jie , Xiao, Peng , Zhang, Dengsheng
- Date: 2023
- Type: Text , Conference paper
- Relation: 19th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery, ICNC-FSKD 2023, Harbin, China, 29-31 July 2023, 2023 19th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD)
- Full Text: false
- Reviewed:
- Description: Despite the outstanding performance of deep learning-based fine-grained image classification methods, the commonly used models still suffer from high cost of computation and memory Therefore, this paper proposes a mobile-based CNN network that focuses on discriminative features of fine-grained images by embedding a hybrid-domain attention module to achieve higher accuracy in recognition. Specifically, under the premise of reducing network parameters, this paper presents a classification method that combines transfer learning and knowledge distillation to enhance the model's generalization performance and resistance to overfitting. Different knowledge transfer strategies are validated through the experiments in the knowledge distillation process. Mobile models such as SqueezeNet, MobileNetV2, and CBAM MobileNetV2 all demonstrate enhanced performance the knowledge distillation optimization. The proposed method in this paper can be used to develop a lightweight mobile-based CNN model with comparable performance to complex models making it more advantageous in real-life scenarios with limited storage resources and low hardware computation levels. Additionally, the model compression process utilizes only the intermediate features of the original dataset, meeting the confidentiality requirements of the original data in the field of public security. © 2023 IEEE.
A kernel-based approach for content-based image retrieval
- Authors: Karmakar, Priyabrata , Teng, Shyh , Lu, Guojun , Zhang, Dengsheng
- Date: 2018
- Type: Text , Conference proceedings , Conference paper
- Relation: 2018 International Conference on Image and Vision Computing New Zealand; Auckland, New Zealand; 19th-21st November 2018 p. 1-6
- Full Text: false
- Reviewed:
- Description: Content-based image retrieval (CBIR) is a popular approach to retrieve images based on a query. In CBIR, retrieval is executed based on the properties of image contents (e.g. gradient, shape, color, texture) which are generally encoded into image descriptors. Among the various image descriptors, histogram-based descriptors are very popular. However, they suffer from the limitation of coarse quantization. In contrast, the use of kernel descriptors (KDES) is proven to be more effective than histogram-based descriptors in other applications, e.g. image classification. This is because, in the KDES framework, instead of the quantization of pixel attributes, each pixel equally takes part in the similarity measurement between two images. In this paper, we propose an approach for how the conventional KDES and its improved version can be used for CBIR. In addition, we have provided a detailed insight into the effectiveness of improved kernel descriptors. Finally, our experiment results will show that kernel descriptors are significantly more effective than histogram-based descriptors in CBIR.
A novel perceptual dissimilarity measure for image retrieval
- Authors: Shojanazeri, Hamid , Zhang, Dengsheng , Teng, Shyh , Aryal, Sunil , Lu, Guojun
- Date: 2018
- Type: Text , Conference proceedings , Conference paper
- Relation: 2018 International Conference on Image and Vision Computing New Zealand, IVCNZ 2018; Auckland, New Zealand; 19th-21st November 2018 Vol. 2018-November, p. 1-6
- Full Text: false
- Reviewed:
- Description: Similarity measure is an important research topic in image classification and retrieval. Given a type of image features, a good similarity measure should be able to retrieve similar images from the database while discard irrelevant images from the retrieval. Similarity measures in literature are typically distance based which measure the spatial distance between two feature vectors in high dimensional feature space. However, this type of similarity measures do not have any perceptual meaning and ignore the neighborhood influence in the similarity decision making process. In this paper, we propose a novel dissimilarity measure, which can measure both the distance and perceptual similarity of two image features in feature space. Results show the proposed similarity measure has a significant improvement over the traditional distance based similarity measure commonly used in literature.
- Description: International Conference Image and Vision Computing New Zealand
Enhancing the effectiveness of local descriptor based image matching
- Authors: Hossain, Md Tahmid , Teng, Shyh , Zhang, Dengsheng , Lim, Suryani , Lu, Guojun
- Date: 2018
- Type: Text , Conference proceedings , Conference paper
- Relation: 2018 International Conference on Digital Image Computing: Techniques and Applications, DICTA 2018; Canberra, Australia; 10th-13th December 2018 p. 1-8
- Full Text: false
- Reviewed:
- Description: Image registration has received great attention from researchers over the last few decades. SIFT (Scale Invariant Feature Transform), a local descriptor-based technique is widely used for registering and matching images. To establish correspondences between images, SIFT uses a Euclidean Distance ratio metric. However, this approach leads to a lot of incorrect matches and eliminating these inaccurate matches has been a challenge. Various methods have been proposed attempting to mitigate this problem. In this paper, we propose a scale and orientation harmony-based pruning method that improves image matching process by successfully eliminating incorrect SIFT descriptor matches. Moreover, our technique can predict the image transformation parameters based on a novel adaptive clustering method with much higher matching accuracy. Our experimental results have shown that the proposed method has achieved averages of approximately 16% and 10% higher matching accuracy compared to the traditional SIFT and a contemporary method respectively.
- Description: 2018 International Conference on Digital Image Computing: Techniques and Applications, DICTA 2018
Image clustering using a similarity measure incorporating human perception
- Authors: Shojanazeri, Hamid , Aryal, Sunil , Teng, Shyh , Zhang, Dengsheng , Lu, Guojun
- Date: 2018
- Type: Text , Conference proceedings , Conference paper
- Relation: 2018 International Conference on Image and Vision Computing New Zealand, IVCNZ 2018; Auckland, New Zealand; 19th-21st November 2018 p. 1-6
- Full Text: false
- Reviewed:
- Description: Clustering similar images is an important task in image processing and computer vision. It requires a measure to quantify pairwise similarities of images. The performance of clustering algorithm depends on the choice of similarity measure. In this paper, we investigate the effectiveness of data independent (distance-based), data-dependent (mass-based) and hybrid (dis)similarity measures in the image clustering task using three benchmark image collections with different sets of features. Our results of K-Medoids clustering show that uses the hybrid Perceptual Dissimilarity Measure (PMD) produces better clustering results than distance-based l(p) - norm and mass-based m(p) - dissimilarity.
Extracting road centrelines from binary road images by optimizing geodesic lines
- Authors: Zhou, Shaoguang , Lu, Guojun , Teng, Shyh , Zhang, Dengsheng
- Date: 2016
- Type: Text , Conference proceedings , Conference paper
- Relation: 2015 International Conference on Image and Vision Computing New Zealand, IVCNZ 2015; Auckland, New Zealand; 23rd-24th November 2015 Vol. 2016-November, p. 1-6
- Full Text: false
- Reviewed:
- Description: Binary road images can be obtained from remotely sensed images with the aid of classification and segmentation techniques. Extracting road centrelines from these binary images are crucial to update a Geographic Information System (GIS) database. A current state of art method of centreline extraction needs to remove road junctions and depends on the accuracy of the endpoints, leading to three main limitations: (1) causing small gaps in the roads, (2) wrongly treating short non-road segments as roads, and (3) producing centrelines of low accuracy around the road end regions. To overcome these limitations, we propose to use an iteratively searching scheme to obtain the longest geodesic line in the preprocessed road skeleton images. Several image pixels at each end of the geodesic lines were removed to avoid noise, and the remaining parts were optimized using a dynamic programming snake model. The proposed method is applied to three types of binary road images and compared with the state of art method. It shows that the proposed method is less affected by the end regions of the roads, and is effective in filling the gaps in the roads. It also has an advantage on processing short non-road segments. © 2015 IEEE.
- Description: International Conference Image and Vision Computing New Zealand
Optimizing cepstral features for audio classification
- Authors: Fu, Zhouyu , Lu, Guojun , Ting, Kaiming , Zhang, Dengsheng
- Date: 2013
- Type: Text , Conference paper
- Relation: International Joint Conference on Artificial Intelligence p. 1330-1336
- Full Text: false
- Reviewed:
- Description: Cepstral features have been widely used in audio applications. Domain knowledge has played an important role in designing different types of cepstral features proposed in the literature. In this paper, we present a novel approach for learning optimized cepstral features directly from audio data to better discriminate between different categories of signals in classification tasks. We employ multi-layer feedforward neural networks to model the cepstral feature extraction process. The network weights are initialized to replicate a reference cepstral feature like the mel frequency cepstral coefficient. We then propose a embedded approach that integrates feature learning with the training of a support vector machine (SVM) classifier. A single optimization problem is formulated where the feature and classifier variables are optimized simultaneously so as to refine the initial features and minimize the classification risk. Experimental results have demonstrated the effectiveness of the proposed feature learning approach, outperforming competing methods by a large margin on benchmark data.
A class centric feature and classifier ensemble selection approach for music genre classification
- Authors: Ariyaratne, Hasitha Bimsara , Zhang, Dengsheng , Lu, Guojun
- Date: 2012
- Type: Text , Conference paper
- Relation: Joint IAPR International Workshop SSPR & SPR 2012 p. 666-674
- Full Text: false
- Reviewed:
- Description: Music genre classification has attracted a lot of research interest due to the rapid growth of digital music. Despite the availability of a vast number of audio features and classification techniques, genre classification still remains a challenging task. In this work we propose a class centric feature and classifier ensemble selection method which deviates from the conventional practice of employing a single, or an ensemble of classifiers trained with a selected set of audio features. We adopt a binary decomposition technique to divide the multiclass problem into a set of binary problems which are then treated in a class specific manner. This differs from the traditional techniques which operate on the naive assumption that a specific set of features and/or classifiers can perform equally well in identifying all the classes. Experimental results obtained on a popular genre dataset and a newly created dataset suggest significant improvements over traditional techniques.
A novel automatic hierachical approach to music genre classification
- Authors: Ariyaratne, Hasitha , Zhang, Dengsheng
- Date: 2012
- Type: Text , Conference paper
- Relation: 2012 IEEE International Conference on Multimedia and Expo Workshops (ICMEW)
- Full Text: false
- Reviewed:
- Description: Automatic music genre classification is an important component in Music Information Retrieval (MIR). It has gained lot of attention lately due to the rapid growth in the use of digital music. Past work in this area has already produced a number of audio features and classification techniques; however, genre classification still remains an unsolved problem. In this paper we explore a hybrid unsupervised/supervised top-down hierarchical classification approach. Most existing work on hierarchical music genre classification relies on human built trees and taxonomies; however these hierarchies may not always translate well into machine classification problems. Therefore, we explore an automatic approach to construct a classification tree through subspace cluster analysis. Experimental results validate the tree building algorithm and provide a new research direction for automatic genre classification. We also addressed the issue of scarcity in publicly available music datasets, by introducing a new dataset containing genre, artist and album labels.
Comparison of curvelet and wavelet texture features for content based image retrieval
- Authors: Sumana, Ishrat , Lu, Guojun , Zhang, Dengsheng
- Date: 2012
- Type: Text , Conference paper
- Relation: 2012 IEEE International Conference on Multimedia and Expo (ICME) p. 290-295
- Full Text: false
- Reviewed:
- Description: Texture feature plays a vital role in content based Image retrieval (CBIR). Wavelet texture feature modeled by generalized Gaussian density (GGD) [1] performs better than discrete wavelet texture feature. Curve let texture feature was proposed in [2]. In this paper, we compute a new texture feature by applying the generalized Gaussian density to the distribution of curve let coefficients which we call curve let GGD texture feature. The purpose of this paper is to investigate curve let GGD texture feature and compare its retrieval performance with that of curve let, wavelet and wavelet GGD texture features. Experimental results show that both curve let and curve let GGD features perform significantly better than wavelet and wavelet GGD texture features. Among the two types of curve let based features, curve let feature shows better performance in CBIR than curve let GGD texture feature. The findings are discussed in the paper.
Learning sparse kernel classifiers in the primal
- Authors: Fu, Zhouyu , Lu, Guojun , Ting, Kaiming , Zhang, Dengsheng
- Date: 2012
- Type: Text , Conference paper
- Relation: Joint IAPR International Workshop, SSPR&SPR 2012; Hiroshima, Japan; 7th-9th November 2012; published in Structural, Syntactic, and Statistical Pattern Recognition (part of the Lecture Notes in Computer Science) Vol. 7626, p. 60-69
- Full Text: false
- Reviewed:
- Description: The increasing number of classification applications in large data sets demands that efficient classifiers be designed not only in training but also for prediction. In this paper, we address the problem of learning kernel classifiers with reduced complexity and improved efficiency for prediction in comparison to those trained by standard methods. A single optimisation problem is formulated for classifier learning which optimises both classifier weights and eXpansion Vectors (XVs) that define the classification function in a joint fashion. Unlike the existing approach of Wu et al, which performs optimisation in the dual formulation, our approach solves the primal problem directly. The primal problem is much more efficient to solve, as it can be converted to the training of a linear classifier in each iteration, which scales linearly to the size of the data set and the number of expansions. This makes our primal approach highly desirable for large-scale applications, where the dual approach is inadequate and prohibitively slow due to the solution of cubic-time kernel SVM involved in each iteration. Experimental results have demonstrated the efficiency and effectiveness of the proposed primal approach for learning sparse kernel classifiers that clearly outperform the alternatives.
Building sparse support vector machines for multi-instance classification
- Authors: Fu, Zhouyu , Lu, Guojun , Ting, Kaiming , Zhang, Dengsheng
- Date: 2011
- Type: Text , Conference paper
- Relation: European Conference on Machine Learning Knowledge Discovery in Databases (ECML PKDD) p. 471-486
- Full Text: false
- Reviewed:
- Description: We propose a direct approach to learning sparse Support Vector Machine (SVM) prediction models for Multi-Instance (MI) classification. The proposed sparse SVM is based on a “label-mean” formulation of MI classification which takes the average of predictions of individual instances for bag-level prediction. This leads to a convex optimization problem, which is essential for the tractability of the optimization problem arising from the sparse SVM formulation we derived subsequently, as well as the validity of the optimization strategy we employed to solve it. Based on the “label-mean” formulation, we can build sparse SVM models for MI classification and explicitly control their sparsities by enforcing the maximum number of expansions allowed in the prediction function. An effective optimization strategy is adopted to solve the formulated sparse learning problem which involves the learning of both the classifier and the expansion vectors. Experimental results on benchmark data sets have demonstrated that the proposed approach is effective in building very sparse SVM models while achieving comparable performance to the state-of-the-art MI classifiers.
On low-rank regularized least squares for scalable nonlinear classification
- Authors: Fu, Zhouyu , Lu, Guojun , Ting, Kaiming , Zhang, Dengsheng
- Date: 2011
- Type: Text , Conference paper
- Relation: International Conference on Neural Information Processing p. 490-499
- Full Text: false
- Reviewed:
- Description: In this paper, we revisited the classical technique of Regularized Least Squares (RLS) for the classification of large-scale nonlinear data. Specifically, we focus on a low-rank formulation of RLS and show that it has linear time complexity in the data size only and does not rely on the number of labels and features for problems with moderate feature dimension. This makes low-rank RLS particularly suitable for classification with large data sets. Moreover, we have proposed a general theorem for the closed-form solutions to the Leave-One-Out Cross Validation (LOOCV) estimation problem in empirical risk minimization which encompasses all types of RLS classifiers as special cases. This eliminates the reliance on cross validation, a computationally expensive process for parameter selection, and greatly accelerate the training process of RLS classifiers. Experimental results on real and synthetic large-scale benchmark data sets have shown that low-rank RLS achieves comparable classification performance while being much more efficient than standard kernel SVM for nonlinear classification. The improvement in efficiency is more evident for data sets with higher dimensions.
An enhancement to closed-form method for natural image matting
- Authors: Zhu, Jun , Zhang, Dengsheng , Lu, Guojun
- Date: 2010
- Type: Text , Conference paper
- Relation: Proceedings of the 2010 Digital Image Computing: Techniques and Applications p. 629-634
- Full Text: false
- Reviewed:
- Description: Natural image matting is a task to estimate fractional opacity of foreground layer from an image. Many matting methods have been proposed, and most of them are trimap-based. Among these methods, closed-form matting offers both trimap-based and scribble-based matting. However, the closed-form method causes significant errors at background-hole regions due to over-smoothing. In this paper, we identify the source of the problem and propose our solution to enhance the closed-form method. Experiments show that our enhanced method can improve the accuracy for trimap-based images and obtain similar result to the closed-form method for scribble-based matting.
Improved spatial pyramid matching for image classification
- Authors: Shahiduzzaman, Mohammad , Zhang, Dengsheng , Lu, Guojun
- Date: 2010
- Type: Text , Conference paper
- Relation: 10th Asian Conference on Computer Vision p. 449-459
- Full Text: false
- Reviewed:
- Description: Spatial analysis of salient feature points has been shown to be promising in image analysis and classification. In the past, spatial pyramid matching makes use of both of salient feature points and spatial multiresolution blocks to match between images. However, it is shown that different images or blocks can still have similar features using spatial pyramid matching. The analysis and matching will be more accurate in scale space. In this paper, we propose to do spatial pyramid matching in scale space. Specifically, pyramid match histograms are computed in multiple scales to refine the kernel for support vector machine classification. We show that the combination of salient point features, scale space and spatial pyramid matching improves the original spatial pyramid matching significantly.
Learning naive Bayes classifiers for music classification and retrieval
- Authors: Fu, Zhouyu , Lu, Guojun , Ting, Kaiming , Zhang, Dengsheng
- Date: 2010
- Type: Text , Conference paper
- Relation: Proceedings of the 20th International Conference on Pattern Recognition p. 4589-4592
- Full Text: false
- Reviewed:
- Description: In this paper, we explore the use of naive Bayes classifiers for music classification and retrieval. The motivation is to employ all audio features extracted from local windows for classification instead of just using a single song-level feature vector produced by compressing the local features. Two variants of naive Bayes classifiers are studied based on the extensions of standard nearest neighbor and support vector machine classifiers. Experimental results have demonstrated superior performance achieved by the proposed naive Bayes classifiers for both music classification and retrieval as compared to the alternative methods.
Region based color image retrieval using curvelet transform
- Authors: Islam, Md , Zhang, Dengsheng , Lu, Guojun
- Date: 2010
- Type: Text , Conference paper
- Relation: Proceedings of the 9th Asian Conference on Computer Vision p. 448-457
- Full Text: false
- Reviewed:
- Description: Effective texture feature is an essential component in any content based image retrieval system. In the past, spectral features, like Gabor and wavelet, have shown superior retrieval performance than many other statistical and structural based features. Recent researches on multi-resolution analysis have found that curvelet captures texture properties, like curves, lines, and edges, more accurately than Gabor filters. However, the texture feature extracted using curvelet transform is not rotation invariant. This can degrade its retrieval performance significantly, especially in cases where there are many similar images with different orientations. This paper analyses the curvelet transform and derives a useful approach to extract rotation invariant curvelet features. Experimental results show that the new rotation invariant curvelet feature outperforms the curvelet feature without rotation invariance.
Automatic image annotation based on decision tree machine learning
- Authors: Jiang, Lixing , Hou, Jin , Zeng, Chen , Zhang, Dengsheng
- Date: 2009
- Type: Text , Conference paper
- Relation: Proceedings of the International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery p. 170-175
- Full Text: false
- Reviewed:
- Description: With the rapid development of digital imaging technology, image annotation is an important and challenging task in image retrieval. At present, many machine learning methods have been applied to solve the problem of automatic image annotation (AIA). However, there exists enormous semantic expressive gap between the low-level image features and high-level semantic concepts. Due to the problem, the annotation performance of existing methods is not satisfactory, and needs to be further improved. This paper proposes an automatic annotation framework via a novel decision tree-based Bayesian (DTB) machine learning algorithm. It is a hybrid approach that attempts to utilize the advantages of both DT and Naive-Bayesian (NB). We firstly segment an image into different regions and extract low-level features of each region. From these features, high-level semantic concepts are obtained using a DTB learning algorithm. Finally, experiments conducted on the Corel dataset demonstrate the effectiveness of DTB machine learning. The DTB can not only enhance the classification accuracy, but also associate low-level region features with high-level image concepts. This method presents the advantages of the Bayesian method and the DT. Moreover, this semantic interpretation capability is a natural simulation of human learning.
Rotation invariant curvelet features for texture image retrieval
- Authors: Islam, Md , Zhang, Dengsheng , Lu, Guojun
- Date: 2009
- Type: Text , Conference paper
- Relation: Proceedings of the 2009 IEEE International Conference on Multimedia and Expo p. 562-565
- Full Text: false
- Reviewed:
- Description: Effective texture feature is an essential component in any content based image retrieval system. In the past, spectral features, like Gabor and wavelet, have shown superior retrieval performance than many other statistical and structural based features. Recent researches on multi-resolution analysis have found that curvelet captures texture properties, like curves, lines, and edges, more accurately than Gabor filters. However, the texture feature extracted using curvelet transform is not rotation invariant. This can degrade its retrieval performance significantly, especially in cases where there are many similar images with different orientations. This paper analyses the curvelet transform and derives a useful approach to extract rotation invariant curvelet features. Experimental results show that the new rotation invariant curvelet feature outperforms the curvelet feature without rotation invariance.