Integrated generalized zero-shot learning for fine-grained classification
- Authors: Shermin, Tasfia , Teng, Shyh , Sohel, Ferdous , Murshed, Manzur , Lu, Guojun
- Date: 2022
- Type: Text , Journal article
- Relation: Pattern Recognition Vol. 122, no. (2022), p.
- Full Text:
- Reviewed:
- Description: Embedding learning (EL) and feature synthesizing (FS) are two of the popular categories of fine-grained GZSL methods. EL or FS using global features cannot discriminate fine details in the absence of local features. On the other hand, EL or FS methods exploiting local features either neglect direct attribute guidance or global information. Consequently, neither method performs well. In this paper, we propose to explore global and direct attribute-supervised local visual features for both EL and FS categories in an integrated manner for fine-grained GZSL. The proposed integrated network has an EL sub-network and a FS sub-network. Consequently, the proposed integrated network can be tested in two ways. We propose a novel two-step dense attention mechanism to discover attribute-guided local visual features. We introduce new mutual learning between the sub-networks to exploit mutually beneficial information for optimization. Moreover, we propose to compute source-target class similarity based on mutual information and transfer-learn the target classes to reduce bias towards the source domain during testing. We demonstrate that our proposed method outperforms contemporary methods on benchmark datasets. © 2021 Elsevier Ltd
Online dual dictionary learning for visual object tracking
- Authors: Cheng, Xu , Zhang, Yifeng , Zhou, Lin , Lu, Guojun
- Date: 2021
- Type: Text , Journal article
- Relation: Journal of Ambient Intelligence and Humanized Computing Vol. 12, no. 12 (2021), p. 10881-10896
- Full Text: false
- Reviewed:
- Description: Sparse representation method has been widely applied to visual tracking. Most of existing tracking algorithms based on sparse representation exploit the l0 or l1-norm for solving the sparse coefficients. However, it makes the execution of solution very time consuming. In this paper, we propose an effective dual dictionary learning model for visual tracking. The dictionary model is composed of discriminative dictionary and analytic dictionary; they work together to perform the representation and discrimination simultaneously. First, we exploit the object states of the first ten frames of a video to initialize the dual dictionary. In the tracking phase, the dual dictionary model is updated alternatively. Second, the local and global information of the object are integrated into the dual dictionary learning model. Sparse coefficients of the patch are used to encode the local structural information of the object. Furthermore, all the sparse coefficients within one object state form a global object representation. We develop a likelihood function that takes an adaptive threshold into consideration to de-noise the global representation. In addition, the object template is updated via an online scheme to adapt the object appearance changes. The experiments on a number of common benchmark test sets show that our approach is more effective than the existing methods. © 2021, The Author(s), under exclusive licence to Springer-Verlag GmbH, DE part of Springer Nature.
Siamese network for object tracking with multi-granularity appearance representations
- Authors: Zhang, Zhuoyi , Zhang, Yifeng , Cheng, Xu , Lu, Guojun
- Date: 2021
- Type: Text , Journal article
- Relation: Pattern Recognition Vol. 118, no. (2021), p.
- Full Text: false
- Reviewed:
- Description: A reliable tracker has the ability to adapt to change of objects over time, and is robust and accurate. We build such a tracker by extracting semantic features using robust Siamese networks and multi-granularity color features. It incorporates a semantic model that can capture high quality semantic features and an appearance model that can describe object at pixel, local and global levels effectively. Furthermore, we propose a novel selective traverse algorithm to allocate weights to semantic models and appearance models dynamically for better tracking performance. During tracking, our tracker updates appearance representations for objects based on the recent tracking results. The proposed tracker operates at speeds that exceed the real-time requirement, and outperforms nearly all other state-of-the-art trackers on OTB-2013/2015 and VOT-2016/2017 benchmarks. © 2021 Elsevier Ltd
A detector of structural similarity for multi-modal microscopic image registration
- Authors: Lv, Guohua , Teng, Shyh , Lu, Guojun
- Date: 2018
- Type: Text , Journal article
- Relation: Multimedia Tools and Applications Vol. 77, no. 6 (2018), p. 7675-7701
- Full Text: false
- Reviewed:
- Description: This paper presents a Detector of Structural Similarity (DSS) to minimize the visual differences between brightfield and confocal microscopic images. The context of this work is that it is very challenging to effectively register such images due to a low structural similarity in image contents. To address this issue, DSS aims to maximize the structural similarity by utilizing the intensity relationships among red-green-blue (RGB) channels in images. Technically, DSS can be combined with any multi-modal image registration technique in registering brightfield and confocal microscopic images. Our experimental results show that DSS significantly increases the visual similarity in such images, thereby improving the registration performance of an existing state-of-the-art multi-modal image registration technique by up to approximately 27%. © 2017, Springer Science+Business Media New York.
COREG : A corner based registration technique for multimodal images
- Authors: Lv, Guohua , Teng, Shyh , Lu, Guojun
- Date: 2018
- Type: Text , Journal article
- Relation: Multimedia Tools and Applications Vol. 77, no. 10 (2018), p. 12607-12634
- Full Text: false
- Reviewed:
- Description: This paper presents a COrner based REGistration technique for multimodal images (referred to as COREG). The proposed technique focuses on addressing large content and scale differences in multimodal images. Unlike traditional multimodal image registration techniques that rely on intensities or gradients for feature representation, we propose to use contour-based corners. First, curvature similarity between corners are for the first time explored for the purpose of multimodal image registration. Second, a novel local descriptor called Distribution of Edge Pixels Along Contour (DEPAC) is proposed to represent the edges in the neighborhood of corners. Third, a simple yet effective way of estimating scale difference is proposed by making use of geometric relationships between corner triplets from the reference and target images. Using a set of benchmark multimodal images and multimodal microscopic images, we will demonstrate that our proposed technique outperforms a state-of-the-art multimodal image registration technique. © 2017, Springer Science+Business Media, LLC.
Enhancing image registration performance by incorporating distribution and spatial distance of local descriptors
- Authors: Lv, Guohua , Teng, Shyh , Lu, Guojun
- Date: 2018
- Type: Text , Journal article
- Relation: Pattern Recognition Letters Vol. 103, no. (2018), p. 46-52
- Full Text: false
- Reviewed:
- Description: A data dependency similarity measure called mp-dissimilarity has been recently proposed. Unlike ℓp-norm distance which is widely used in calculating the similarity between vectors, mp-dissimilarity takes into account the relative positions of the two vectors with respect to the rest of the data. This paper investigates the potential of mp-dissimilarity in matching local image descriptors. Moreover, three new matching strategies are proposed by considering both ℓp-norm distance and mp-dissimilarity. Our proposed matching strategies are extensively evaluated against ℓp-norm distance and mp-dissimilarity on a few benchmark datasets. Experimental results show that mp-dissimilarity is a promising alternative to ℓp-norm distance in matching local descriptors. The proposed matching strategies outperform both ℓp-norm distance and mp-dissimilarity in matching accuracy. One of our proposed matching strategies is comparable to ℓp-norm distance in terms of recall vs 1-precision. © 2018 Elsevier B.V.
Enhancing SIFT-based image registration performance by building and selecting highly discriminating descriptors
- Authors: Lv, Guohua , Teng, Shyh , Lu, Guojun
- Date: 2016
- Type: Text , Journal article
- Relation: Pattern Recognition Letters Vol. 84, no. (2016), p. 156-162
- Full Text: false
- Reviewed:
- Description: In this paper we will investigate the gradient utilization in building SIFT (Scale Invariant Feature Transform)-like descriptors for image registration. There are generally two types of gradient information, i.e. gradient magnitude and gradient occurrence, which can be used for building SIFT-like descriptors. We will provide a theoretical analysis on the effectiveness of each of the two types of gradient information when used individually. Based on our analysis, we will propose a novel technique which systematically uses both types of gradient information together for image registration. Moreover, we will propose a strategy to select keypoint matches with a higher discrimination. The proposed technique can be used for both mono-modal and multi-modal image registration. Our experimental results show that the proposed technique improves registration accuracy over existing SIFT-like descriptors. © 2016 Elsevier B.V.
Effective and efficient contour-based corner detectors
- Authors: Teng, Shyh , Najmus Sadat, Rafi , Lu, Guojun
- Date: 2015
- Type: Text , Journal article
- Relation: Pattern Recognition Vol. 48, no. 7 (2015), p. 2185-2197
- Full Text: false
- Reviewed:
- Description: Corner detection is an essential operation in many computer vision applications. Among the contour-based corner detectors in the literature, the Chord-to-Point Distance Accumulation (CPDA) detector is reported to have one of the highest repeatability in detecting robust corners and the lowest localization error. However, based on our analysis, we found that the CPDA detector often fails to accurately detect the true corners when a curve has multiple corners but the sharpness of one or a few of them is much more prominent than the rest. This detector also might not perform well when the corners are closely located. Furthermore, the CPDA detector is also computationally very expensive. To overcome these weaknesses, we propose two effective and efficient corner detectors using simple triangular theory and distance calculation. Our experimental results show that our proposed detectors outperform CPDA and nine other existing corner detectors in terms of repeatability. Our proposed detectors also have a relatively low or comparable localization error and are computationally more efficient. © 2015 Elsevier Ltd.
Multimodal image registration technique based on improved local feature descriptors
- Authors: Teng, Shyh , Hossain, Tanvir , Lu, Guojun
- Date: 2015
- Type: Text , Journal article
- Relation: Journal of Electronic Imaging Vol. 24, no. 1 (2015), p.
- Full Text:
- Reviewed:
- Description: Multimodal image registration has received significant research attention over the past decade, and the majority of the techniques are global in nature. Although local techniques are widely used for general image registration, there are only limited studies on them for multimodal image registration. Scale invariant feature transform (SIFT) is a well-known general image registration technique. However, SIFT descriptors are not invariant to multimodality. We propose a SIFT-based technique that is modality invariant and still retains the strengths of local techniques. Moreover, our proposed histogram weighting strategies also improve the accuracy of descriptor matching, which is an important image registration step. As a result, our proposed strategies can not only improve the multimodal registration accuracy but also have the potential to improve the performance of all SIFT-based applications, e.g., general image registration and object recognition.
A performance review of recent corner detectors
- Authors: Awrangjeb, Mohammad , Lu, Guojun
- Date: 2013
- Type: Text , Conference paper
- Relation: International Conference on Digital Image Computing: Techniques and Applications, 26 November 2013 to 28 November 2013 p. 157-164
- Full Text:
- Reviewed:
- Description: Contour-based corner detectors directly or indirectly estimate a significance measure (eg, curvature) on the points of a planar curve and select the curvature extrema points as corners. A number of promising contour-based corner detectors have recently been proposed. They mainly differ in how the curvature is estimated on each point of the given curve. As the curvature on a digital curve can only be approximated, it is important to estimate a curvature that remains stable against significant noises, for example, geometric transformations and compression, on the curve. Moreover, in many applications, for instance, in content-based image retrieval, a fast corner detector is a prerequisite. So, it is also a primary characteristic that how much time a corner detector takes for corner detection in a given image. In addition, different authors evaluated their detectors on different platforms using different evaluation systems. Evaluation systems that depend on human judgements and visual identification of corners are manual and too subjective. Application of a manual system on a large test database will be expensive. Therefore, it is important to evaluate the detectors on a common platform using an automatic evaluation system. This paper first reviews six most recent and highly performed corner detectors and analyse their theoretical running time. Then it uses an automatic evaluation system to analyse their performance. Both the robustness to noise and efficiency are estimated to rank the detectors.
Building roof plane extraction from LIDAR data
- Authors: Awrangjeb, Mohammad , Lu, Guojun
- Date: 2013
- Type: Text , Conference paper
- Relation: 2013 International Conference on Digital Image Computing: Techniques and Applications (DICTA)
- Full Text:
- Reviewed:
- Description: This paper presents a new segmentation technique to use LIDAR point cloud data for automatic extraction of building roof planes. The raw LIDAR points are first classified into two major groups: ground and non-ground points. The ground points are used to generate a 'building mask' in which the black areas represent the ground where there are no laser returns below a certain height. The non-ground points are segmented to extract the planar roof segments. First, the building mask is divided into small grid cells. The cells containing the black pixels are clustered such that each cluster represents an individual building or tree. Second, the non-ground points within a cluster are segmented based on their coplanarity and neighbourhood relations. Third, the planar segments are refined using a rule-based procedure that assigns the common points among the planar segments to the appropriate segments. Finally, another rule-based procedure is applied to remove tree planes which are generally small in size and randomly oriented. Experimental results on three Australian sites have shown that the proposed method offers high building detection and roof plane extraction rates.
Efficient nonlinear classification via low-rank regularised least squares
- Authors: Fu, Zhouyu , Lu, Guojun , Ting, Kaiming , Zhang, Dengsheng
- Date: 2013
- Type: Text , Journal article
- Relation: Neural Computing and Applications Vol. 22, no. 7-8(2013), p. 1279-1289
- Full Text: false
- Reviewed:
- Description: We revisit the classical technique of regularised least squares (RLS) for nonlinear classification in this paper. Specifically, we focus on a low-rank formulation of the RLS, which has linear time complexity in the size of data set only, independent of both the number of classes and number of features. This makes low-rank RLS particularly suitable for problems with large data and moderate feature dimensions. Moreover, we have proposed a general theorem for obtaining the closed-form estimation of prediction values on a holdout validation set given the low-rank RLS classifier trained on the whole training data. It is thus possible to obtain an error estimate for each parameter setting without retraining and greatly accelerate the process of cross-validation for parameter selection. Experimental results on several large-scale benchmark data sets have shown that low-rank RLS achieves comparable classification performance while being much more efficient than standard kernel SVM for nonlinear classification. The improvement in efficiency is more evident for data sets with higher dimensions.
Learning sparse kernel classifiers for multi-instance classification
- Authors: Fu, Zhouyu , Lu, Guojun , Ting, Kaiming , Zhang, Dengsheng
- Date: 2013
- Type: Text , Journal article
- Relation: IEEE Transactions on Neural Networks and Learning Systems Vol. 24, no. 9 (2013), p. 1377-1389
- Full Text: false
- Reviewed:
- Description: We propose a direct approach to learning sparse kernel classifiers for multi-instance (MI) classification to improve efficiency while maintaining predictive accuracy. The proposed method builds on a convex formulation for MI classification by considering the average score of individual instances for bag-level prediction. In contrast, existing formulations used the maximum score of individual instances in each bag, which leads to nonconvex optimization problems. Based on the convex MI framework, we formulate a sparse kernel learning algorithm by imposing additional constraints on the objective function to enforce the maximum number of expansions allowed in the prediction function. The formulated sparse learning problem for the MI classification is convex with respect to the classifier weights. Therefore, we can employ an effective optimization strategy to solve the optimization problem that involves the joint learning of both the classifier and the expansion vectors. In addition, the proposed formulation can explicitly control the complexity of the prediction model while still maintaining competitive predictive performance. Experimental results on benchmark data sets demonstrate that our proposed approach is effective in building very sparse kernel classifiers while achieving comparable performance to the state-of-the-art MI classifiers.
Maximizing structural similarity in multimodal biomedical microscopic images for effective registration
- Authors: Lv, Guohua , Teng, Shyh , Lu, Guojun , Lackmann, Martin
- Date: 2013
- Type: Text , Conference paper
- Relation: 2013 IEEE International Conference on Multimedia and Expo (ICME)
- Full Text: false
- Reviewed:
- Description: Multimodal image registration (MMIR) is the alignment of contents in images captured from different sensors or instruments. MMIR is important in medical applications as it enables the visualization of the complementary contents in biomedical microscopic images. The registration for such images can be challenging as the structures of their contents are usually only partially similar. Thus in this paper, we propose a new method to maximize the structural similarity of the contents in such images by utilizing intensity relationships among Red-Green-Blue color channels. Our experimental results will demonstrate that our proposed method substantially improves the accuracy of registering such images as compared to the state-of-the-art methods.
Optimizing cepstral features for audio classification
- Authors: Fu, Zhouyu , Lu, Guojun , Ting, Kaiming , Zhang, Dengsheng
- Date: 2013
- Type: Text , Conference paper
- Relation: International Joint Conference on Artificial Intelligence p. 1330-1336
- Full Text: false
- Reviewed:
- Description: Cepstral features have been widely used in audio applications. Domain knowledge has played an important role in designing different types of cepstral features proposed in the literature. In this paper, we present a novel approach for learning optimized cepstral features directly from audio data to better discriminate between different categories of signals in classification tasks. We employ multi-layer feedforward neural networks to model the cepstral feature extraction process. The network weights are initialized to replicate a reference cepstral feature like the mel frequency cepstral coefficient. We then propose a embedded approach that integrates feature learning with the training of a support vector machine (SVM) classifier. A single optimization problem is formulated where the feature and classifier variables are optimized simultaneously so as to refine the initial features and minimize the classification risk. Experimental results have demonstrated the effectiveness of the proposed feature learning approach, outperforming competing methods by a large margin on benchmark data.
Structural image retrieval using automatic image annotation and region based inverted file
- Authors: Zhang, Dengsheng , Islam, Md , Lu, Guojun
- Date: 2013
- Type: Text , Journal article
- Relation: Journal of Visual Communication and Image Representation Vol. 24, no. 7 (2013), p. 1087-1098
- Full Text: false
- Reviewed:
- Description: Image retrieval has lagged far behind text retrieval despite more than two decades of intensive research effort. Most of the research on image retrieval in the last two decades are on content based image retrieval or image retrieval based on low level features. Recent research in this area focuses on semantic image retrieval using automatic image annotation. Most semantic image retrieval techniques in literature, however, treat an image as a bag of features/words while ignore the structural or spatial information in the image. In this paper, we propose a structural image retrieval method based on automatic image annotation and region based inverted file. In the proposed system, regions in an image are treated the same way as keywords in a structural text document, semantic concepts are learnt from image data to label image regions as keywords and weight is assigned to each keyword according to spatial position and relationship. As the result, images are indexed and retrieved in the same way as structural document retrieval. Specifically, images are broken down to regions which are represented using colour, texture and shape features. Region features are then quantized to create visual dictionaries which are similar to monolingual dictionaries like English or Chinese dictionaries. In the next step, a semantic dictionary similar to a bilingual dictionary like the English–Chinese dictionary is learnt to mapping image regions to semantic concepts. Finally, images are then indexed and retrieved using a novel region based inverted file data structure. Results show the proposed method has significant advantage over the widely used Bayesian annotation models.
The impact of global and local features on multiple sequence alignment clustering-based near-duplicate video retrieval
- Authors: Wang, Yandan , Lu, Guojun , Belkhatir, Mohammed , Messom, Christopher
- Date: 2013
- Type: Text , Conference paper
- Relation: 14th Pacific-Rim Conference on Multimedia p. 669-677
- Full Text: false
- Reviewed:
- Description: Traditionally, the performance of Near-Duplicate Video Retrieval (NDVR) is enhanced through different video features, matching scheme and indexing methods. The video features have been intensively investigated and it has been shown that local features outperform global features in terms of accuracy. However, local features have the expensive computational problem. Therefore, indexing structure is introduced to assist in scaling up, whilst the accuracy will drop slightly or dramatically in most time by using indexing approaches. Recent progress shows that NDVR based on clustering could reduce searching space while maintains equivalent retrieval accuracy compared to that of non-clustering based. In this paper, we will continue to evaluate clustering based NDVR, but using popular global and local features. Before conducting NDVR, dataset will be pre-processed offline into groups by using clustering algorithm that near-duplicate videos (NDVs) are assembled in the same cluster. Each cluster will be represented by member video or the centroid. The query video will then be compared to the representative videos instead of all videos in database (non-clustering based). Our experiment shows that clustering-based NDVR using global and local features outperforms than that of non-clustering based in terms of both retrieval accuracy and speed.
A review on automatic image annotation techniques
- Authors: Zhang, Dengsheng , Islam, Md , Lu, Guojun
- Date: 2012
- Type: Text , Journal article
- Relation: Pattern Recognition Letters Vol. 45, no. 1 (2012), p. 346-362
- Full Text: false
- Reviewed:
- Description: Nowadays, more and more images are available. However, to find a required image for an ordinary user is a challenging task. Large amount of researches on image retrieval have been carried out in the past two decades. Traditionally, research in this area focuses on content based image retrieval. However, recent research shows that there is a semantic gap between content based image retrieval and image semantics understandable by humans. As a result, research in this area has shifted to bridge the semantic gap between low level image features and high level semantics. The typical method of bridging the semantic gap is through the automatic image annotation (AIA) which extracts semantic features using machine learning techniques. In this paper, we focus on this latest development in image retrieval and provide a comprehensive survey on automatic image annotation. We analyse key aspects of the various AIA methods, including both feature extraction and semantic learning methods. Major methods are discussed and illustrated in details. We report our findings and provide future research directions in the AIA area in the conclusions
Achieving high multi-modal registration performance using simplified Hough-transform with improved symmetric-SIFT
- Authors: Hossain, Md Tanvir , Teng, Shyh , Lu, Guojun
- Date: 2012
- Type: Text , Conference paper
- Relation: 14th International Conference on Digital Image Computing Techniques and Applications, DICTA 2012
- Full Text: false
- Reviewed:
- Description: The traditional way of using Hough Transform with SIFT is for the purpose of reliable object recognition. However, it cannot be effectively applied to image registration in the same way as the recall rate can be significantly lower. In this paper, we propose an alternative implementation of Hough Transform that can be used with Improved Symmetric-SIFT for multi-modal image registration. Our experimental results show that the proposed technique of applying Hough Transform can significantly improve the key-point matching as well as registration accuracy by utilizing aggregated information from key-points throughout the input images.
Performance comparisons of contour-based corner detectors
- Authors: Awrangjeb, Mohammad , Lu, Guojun , Fraser, Clive
- Date: 2012
- Type: Text , Journal article
- Relation: IEEE Transactions on Image Processing Vol. 21, no. 9 (2012), p. 4167-4179
- Full Text: false
- Reviewed:
- Description: Abstract— Corner detectors have many applications in computer vision and image identification and retrieval. Contour-based corner detectors directly or indirectly estimate a significance measure (e.g., curvature) on the points of a planar curve, and select the curvature extrema points as corners. While an extensive number of contour-based corner detectors have been proposed over the last four decades, there is no comparative study of recently proposed detectors. This paper is an attempt to fill this gap. The general framework of contour-based corner detection is presented, and two major issues – curve smoothing and curvature estimation, which have major impacts on the corner detection performance, are discussed. A number of promising detectors are compared using both automatic and manual evaluation systems on two large datasets. It is observed that while the detectors using indirect curvature estimation techniques are more robust, the detectors using direct curvature estimation techniques are faster.