List of Titles

Anti-aliasing deep image classifiers using novel depth adaptive blurring and activation function

Authors: Hossain, Md Tahmid , Teng, Shyh , Lu, Guojun , Rahman, Mohammad Arifur , Sohel, Ferdous
Date: 2023
Type: Text , Journal article
Relation: Neurocomputing Vol. 536, no. (2023), p. 164-174
Full Text: false
Reviewed:
Description: Deep convolutional networks are vulnerable to image translation or shift, partly due to common down-sampling layers, e.g., max-pooling and strided convolution. These operations violate the Nyquist sampling rate and cause aliasing. The textbook solution is low-pass filtering (blurring) before down-sampling, which can benefit deep networks as well. Even so, non-linearity units, such as ReLU, often re-introduce the problem, suggesting that blurring alone may not suffice. In this work, first, we analyse deep features with Fourier transform and show that Depth Adaptive Blurring is more effective, as opposed to monotonic blurring. To this end, we propose a novel Depth Adaptive Blur-pool (DAB-pool) module to replace existing down-sampling methods. Second, we introduce a novel activation function – with a built-in low pass filter, as an additional measure, to keep the problem from reappearing. From experiments, we observe generalisation on other forms of transformations and corruptions as well, e.g., rotation, scale, and noise. We evaluate our method under three challenging settings: (1) a variety of image translations; (2) adversarial attacks – both

Detection of Malleefowl Mounds from Point Cloud Data

Authors: Parvin, Nahida , Awrangjeb, Mohammad , Irvin, Marc , Florentine, Singarayer , Murshed, Manzur , Lu, Guojun
Date: 2021
Type: Text , Conference paper
Relation: 2021 International Conference on Digital Image Computing: Techniques and Applications, DICTA 2021, Gold Coast, 29 November to 1 December 2021
Full Text: false
Reviewed:
Description: Airborne light detection and ranging (LiDAR) data have become cost and time-efficient means for estimating the size of timid fauna populations through the identification of artefacts that evidence their occurrence in a large, hostile geographic area. The unobtrusive detection method helps conservation managers to assess the stability of a population and to design appropriate conservation programs. Here we propose a mound (nest) detection method for Australia's native iconic bird, the Malleefowl, from point cloud data, which can be manipulated to act as a surrogate for population data. Existing detection methods are largely through manual observations, and are therefore not efficient for covering large and remote areas. The proposed mound detection method can identify mound feature based on height and intensity values provided by the point cloud data. Each candidate mound point is initially selected by applying a height threshold utilising the classified ground points and their corresponding digital elevation model (DEM). Then, another threshold based on intensity range derived from ground truth mound area analysis is applied on the extracted initial mound points to find the final candidate mound points. These extracted points are then used to generate a binary mask where the potential mound points are found sparse. To connect those points, a morphological filter is applied on the binary image and found the mound separated from other remaining non-mound objects. To obtain the mound from other non-mound objects, a morphological cleaning operation and a connected component analysis are carried out on the mask. The non-mound objects are removed from the mask utilising the area property of mound derived from the empirical analysis of ground-truth observations. Finally, the effectiveness of the proposed technique is calculated based on ground truth. Although the mound shapes and structures are highly variable in nature, our height and intensity-based mound point extraction method detected 55 % of the ground-truthed mounds. © 2021 IEEE.

Integrating line weber local descriptor and deep feature for tire indentation mark image classification

Authors: Liu, Ying , Che, Xin , Dong, Haitao , Li, Daxiang , Teng, Shyh , Lu, Guojun
Date: 2021
Type: Text , Conference paper
Relation: 4th International Conference on Artificial Intelligence and Pattern Recognition, 4th International Conference on Artificial Intelligence and Pattern Recognition, AIPR 2021,Virtual, Online,17-19 September 2021, 2021, ACM International Conference Proceeding Series p. 56-61
Full Text: false
Reviewed:
Description: Tire indentation mark matching is an essential tool used for the investigation of criminal cases and traffic incidents. As such images are unique and uncommon, there is a lack of dedicated databases and relevant research on this topic. This paper presents a feature extraction algorithm effective for tire indentation mark image description. The main contributions include: (1) Line feature Weber local descriptor (LWLD) is proposed, which uses the Gabor orientations instead of the original gradient orientation. This feature can describe texture information of tire indentation mark image more efficiently. (2) An attention model is constructed to produce attention feature map of tire indentation mark image. This attention feature map is then fused with LWLD resulting in a feature with more powerful representation capability. Experimental results prove that the combined use of LWLD and attention model greatly enhances the performance of tire indentation mark image matching tasks. © 2021 ACM.

Online dual dictionary learning for visual object tracking

Authors: Cheng, Xu , Zhang, Yifeng , Zhou, Lin , Lu, Guojun
Date: 2021
Type: Text , Journal article
Relation: Journal of Ambient Intelligence and Humanized Computing Vol. 12, no. 12 (2021), p. 10881-10896
Full Text: false
Reviewed:
Description: Sparse representation method has been widely applied to visual tracking. Most of existing tracking algorithms based on sparse representation exploit the l0 or l1-norm for solving the sparse coefficients. However, it makes the execution of solution very time consuming. In this paper, we propose an effective dual dictionary learning model for visual tracking. The dictionary model is composed of discriminative dictionary and analytic dictionary; they work together to perform the representation and discrimination simultaneously. First, we exploit the object states of the first ten frames of a video to initialize the dual dictionary. In the tracking phase, the dual dictionary model is updated alternatively. Second, the local and global information of the object are integrated into the dual dictionary learning model. Sparse coefficients of the patch are used to encode the local structural information of the object. Furthermore, all the sparse coefficients within one object state form a global object representation. We develop a likelihood function that takes an adaptive threshold into consideration to de-noise the global representation. In addition, the object template is updated via an online scheme to adapt the object appearance changes. The experiments on a number of common benchmark test sets show that our approach is more effective than the existing methods. © 2021, The Author(s), under exclusive licence to Springer-Verlag GmbH, DE part of Springer Nature.

Siamese network for object tracking with multi-granularity appearance representations

Authors: Zhang, Zhuoyi , Zhang, Yifeng , Cheng, Xu , Lu, Guojun
Date: 2021
Type: Text , Journal article
Relation: Pattern Recognition Vol. 118, no. (2021), p.
Full Text: false
Reviewed:
Description: A reliable tracker has the ability to adapt to change of objects over time, and is robust and accurate. We build such a tracker by extracting semantic features using robust Siamese networks and multi-granularity color features. It incorporates a semantic model that can capture high quality semantic features and an appearance model that can describe object at pixel, local and global levels effectively. Furthermore, we propose a novel selective traverse algorithm to allocate weights to semantic models and appearance models dynamically for better tracking performance. During tracking, our tracker updates appearance representations for objects based on the recent tracking results. The proposed tracker operates at speeds that exceed the real-time requirement, and outperforms nearly all other state-of-the-art trackers on OTB-2013/2015 and VOT-2016/2017 benchmarks. © 2021 Elsevier Ltd

A Rotation invariant HOG descriptor for tire pattern image classification

Authors: Liu, Ying , Ge, Yuxiang , Wang, Fuping , Liu, Qiqi , Lei, Yanbo , Zhang, Dengsheng , Lu, Guojun
Date: 2019
Type: Text , Conference proceedings
Relation: ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP); Brighton, UK, 12-17 May 2019. p. 2412-2416
Full Text: false
Reviewed:
Description: Texture feature is important in describing tire pattern image which provides useful clue in solving crime cases and traffic accidents. In this paper, we propose a novel texture feature extraction method based on HOG (Histogram of Oriented Gradient) and dominant gradient (DG) in tire pattern images, named HOG-DG. The proposed HOG-DG is not only robust to illumination and scale changes but also is rotation-invariant. In the proposed HOG-DG, HOG features are first computed from circular local cells, and HOG features from an image are concatenated and normalized using the DG to construct the HOG-DG feature. HOG-DG is used to train a support-vector-machine (SVM) classifier for tire pattern classification. Experimental results demonstrate its outstanding performance for tire pattern description.

BackNet: An Enhanced backbone network for accurate detection of objects with large scale variations

Authors: Hossain, Md Tahmid , Teng, Shyh Wei , Lu, Guojun
Date: 2019
Type: Text , Book chapter
Relation: Image and Video Technology. PSIVT 2019 p. 52-64
Full Text: false
Reviewed:
Description: Deep Convolutional Neural Networks (CNNs) have induced significant progress in the field of computer vision including object detection and classification. Two-stage detectors like Faster RCNN and its variants are found to be more accurate compared to its one-stage counter-parts. Faster RCNN combines an ImageNet pretrained backbone network (e.g VGG16) and a Region Proposal Network (RPN) for object detection. Although Faster RCNN performs well on medium and large scale objects, detecting smaller ones with high accuracy while maintaining stable performance on larger objects still remains a challenging task. In this work, we focus on designing a robust backbone network for Faster RCNN that is capable of detecting objects with large variations in scale. Considering the difficulties posed by small objects, our aim is to design a backbone network that allows signals extracted from small objects to be propagated right through to the deepest layers of the network. This being our motivation, we propose a robust network: BackNet, which can be integrated as a backbone into any two-stage detector. We evaluate the performance of BackNet-Faster RCNN on MS COCO dataset and show that the proposed method outperforms five contemporary methods.

Distortion robust image classification using deep convolutional neural network with discrete cosine transform

Authors: Hossain, Md Tahmid , Teng, Shyh Wei , Zhang, Dengsheng , Lim, Suryani , Lu, Guojun
Date: 2019
Type: Text , Conference proceedings
Relation: 2019 IEEE International Conference on Image Processing (ICIP);Taipei, Taiwan; 22-25 Sept, 2019 p. 659-663
Full Text: false
Reviewed:
Description: Convolutional Neural Networks are highly effective for image classification. However, it is still vulnerable to image distortion. Even a small amount of noise or blur can severely hamper the performance of these CNNs. Most work in the literature strives to mitigate this problem simply by fine-tuning a pre-trained CNN on mutually exclusive or a union set of distorted training data. This iterative fine-tuning process with all known types of distortion is exhaustive and the network struggles to handle unseen distortions. In this work, we propose distortion robust DCT-Net, a Discrete Cosine Transform based module integrated into a deep network which is built on top of VGG16 [1]. Unlike other works in the literature, DCT-Net is "blind" to the distortion type and level in an image both during training and testing. The DCT-Net is trained only once and applied in a more generic situation without further retraining. We also extend the idea of dropout and present a training adaptive version of the same. We evaluate our proposed DCT-Net on a number of benchmark datasets. Our experimental results show that once trained, DCT-Net not only generalizes well to a variety of unseen distortions but also outperforms other comparable networks in the literature.

Enhanced transfer learning with ImageNet trained classification layer

Authors: Shermin, Tasfia , Teng, Shyh Wei , Murshed, Manzur , Lu, Guojun , Sohel, Ferdous , Paul, Manoranjan
Date: 2019
Type: Text , Book chapter
Relation: Image and Video Technology Chapter 12 p. 142-1455
Full Text: false
Reviewed:
Description: Parameter fine tuning is a transfer learning approach whereby learned parameters from pre-trained source network are transferred to the target network followed by fine-tuning. Prior research has shown that this approach is capable of improving task performance. However, the impact of the ImageNet pre-trained classification layer in parameter fine-tuning is mostly unexplored in the literature. In this paper, we propose a fine-tuning approach with the pre-trained classification layer. We employ layer-wise fine-tuning to determine which layers should be frozen for optimal performance. Our empirical analysis demonstrates that the proposed fine-tuning performs better than traditional fine-tuning. This finding indicates that the pre-trained classification layer holds less category-specific or more global information than believed earlier. Thus, we hypothesize that the presence of this layer is crucial for growing network depth to adapt better to a new task. Our study manifests that careful normalization and scaling are essential for creating harmony between the pre-trained and new layers for target domain adaptation. We evaluate the proposed depth augmented networks for fine-tuning on several challenging benchmark datasets and show that they can achieve higher classification accuracy than contemporary transfer learning approaches.

Reversible data hiding in encrypted images based on image partition and spatial correlation

Authors: Song, Chang , Zhang, Yifeng , Lu, Guojun
Date: 2019
Type: Text , Conference proceedings , Conference paper
Relation: 17th International Workshop on Digital Forensics and Watermarking, IWDW 2018; Jeju Island, South Korea; 22nd-24th October 2018; Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) Vol. 11378 LNCS, p. 180-194
Full Text: false
Reviewed:
Description: Recently, more and more attention is paid to reversible data hiding (RDH) in encrypted images because of its better protection of privacy compared with traditional RDH methods directly operated in original images. In several RDH algorithms, prediction-error expansion (PEE) is proved to be superior to other methods in terms of embedding capacity and distortion of marked image and multiple histograms modification (MHM) can realize adaptive selection of expansion bins which depends on image content in the modification of a sequence of histograms. Therefore, in this paper, we propose an efficient RDH method in encrypted images by combining PEE and MHM, and design corresponding mode of image partition. We first divide the image into three parts: W (for embedding secret data), B (for embedding the least significant bit(LSB) of W) and G (for generating prediction-error histograms). Then, we apply PEE and MHM to embed the LSB of W to reserve space for secret data. Next, we encrypt the image and change the LSB of W to realize the embedding of secret data. In the process of extraction, the reversibility of image and secret data can be guaranteed. The utilization of correlation between neighbor pixels and embedded order decided by the smoothness of pixel in part W contribute to the performance of our method. Compared to the existing algorithms, experimental results show that the proposed method can reduce distortion to the image at given embedding capacity especially at low embedding capacity.

A detector of structural similarity for multi-modal microscopic image registration

Authors: Lv, Guohua , Teng, Shyh , Lu, Guojun
Date: 2018
Type: Text , Journal article
Relation: Multimedia Tools and Applications Vol. 77, no. 6 (2018), p. 7675-7701
Full Text: false
Reviewed:
Description: This paper presents a Detector of Structural Similarity (DSS) to minimize the visual differences between brightfield and confocal microscopic images. The context of this work is that it is very challenging to effectively register such images due to a low structural similarity in image contents. To address this issue, DSS aims to maximize the structural similarity by utilizing the intensity relationships among red-green-blue (RGB) channels in images. Technically, DSS can be combined with any multi-modal image registration technique in registering brightfield and confocal microscopic images. Our experimental results show that DSS significantly increases the visual similarity in such images, thereby improving the registration performance of an existing state-of-the-art multi-modal image registration technique by up to approximately 27%. © 2017, Springer Science+Business Media New York.

A kernel-based approach for content-based image retrieval

Authors: Karmakar, Priyabrata , Teng, Shyh , Lu, Guojun , Zhang, Dengsheng
Date: 2018
Type: Text , Conference proceedings , Conference paper
Relation: 2018 International Conference on Image and Vision Computing New Zealand; Auckland, New Zealand; 19th-21st November 2018 p. 1-6
Full Text: false
Reviewed:
Description: Content-based image retrieval (CBIR) is a popular approach to retrieve images based on a query. In CBIR, retrieval is executed based on the properties of image contents (e.g. gradient, shape, color, texture) which are generally encoded into image descriptors. Among the various image descriptors, histogram-based descriptors are very popular. However, they suffer from the limitation of coarse quantization. In contrast, the use of kernel descriptors (KDES) is proven to be more effective than histogram-based descriptors in other applications, e.g. image classification. This is because, in the KDES framework, instead of the quantization of pixel attributes, each pixel equally takes part in the similarity measurement between two images. In this paper, we propose an approach for how the conventional KDES and its improved version can be used for CBIR. In addition, we have provided a detailed insight into the effectiveness of improved kernel descriptors. Finally, our experiment results will show that kernel descriptors are significantly more effective than histogram-based descriptors in CBIR.

A novel perceptual dissimilarity measure for image retrieval

Authors: Shojanazeri, Hamid , Zhang, Dengsheng , Teng, Shyh , Aryal, Sunil , Lu, Guojun
Date: 2018
Type: Text , Conference proceedings , Conference paper
Relation: 2018 International Conference on Image and Vision Computing New Zealand, IVCNZ 2018; Auckland, New Zealand; 19th-21st November 2018 Vol. 2018-November, p. 1-6
Full Text: false
Reviewed:
Description: Similarity measure is an important research topic in image classification and retrieval. Given a type of image features, a good similarity measure should be able to retrieve similar images from the database while discard irrelevant images from the retrieval. Similarity measures in literature are typically distance based which measure the spatial distance between two feature vectors in high dimensional feature space. However, this type of similarity measures do not have any perceptual meaning and ignore the neighborhood influence in the similarity decision making process. In this paper, we propose a novel dissimilarity measure, which can measure both the distance and perceptual similarity of two image features in feature space. Results show the proposed similarity measure has a significant improvement over the traditional distance based similarity measure commonly used in literature.
Description: International Conference Image and Vision Computing New Zealand

Classifier-free extraction of power line wires from point cloud data

Authors: Awrangjeb, Mohammad , Gao, Yongsheng , Lu, Guojun
Date: 2018
Type: Text , Conference proceedings
Relation: 2018 International Conference on Digital Image Computing: Techniques and Applications, DICTA 2018; Canberra, Australia; 10th-13th December 2018
Full Text: false
Reviewed:
Description: This paper proposes a classifier-free method for extraction of power line wires from aerial point cloud data. It combines the advantages of both grid- and point-based processing of the input data. In addition to the non-ground point cloud data, the input to the proposed method includes the pylon locations, which are automatically extracted by a previous method. The proposed method first counts the number of wires in a span between the two successive pylons using two masks: vertical and horizontal. Then, the initial wire segments are obtained and refined iteratively. Finally, the initial segments are extended on both ends and each individual wire points are modelled as a 3D polynomial curve. Experimental results show both the object-based completeness and correctness are 97%, while the point-based completeness and correctness are 99% and 88%, respectively.
Description: 2018 International Conference on Digital Image Computing: Techniques and Applications, DICTA 2018

COREG : A corner based registration technique for multimodal images

Authors: Lv, Guohua , Teng, Shyh , Lu, Guojun
Date: 2018
Type: Text , Journal article
Relation: Multimedia Tools and Applications Vol. 77, no. 10 (2018), p. 12607-12634
Full Text: false
Reviewed:
Description: This paper presents a COrner based REGistration technique for multimodal images (referred to as COREG). The proposed technique focuses on addressing large content and scale differences in multimodal images. Unlike traditional multimodal image registration techniques that rely on intensities or gradients for feature representation, we propose to use contour-based corners. First, curvature similarity between corners are for the first time explored for the purpose of multimodal image registration. Second, a novel local descriptor called Distribution of Edge Pixels Along Contour (DEPAC) is proposed to represent the edges in the neighborhood of corners. Third, a simple yet effective way of estimating scale difference is proposed by making use of geometric relationships between corner triplets from the reference and target images. Using a set of benchmark multimodal images and multimodal microscopic images, we will demonstrate that our proposed technique outperforms a state-of-the-art multimodal image registration technique. © 2017, Springer Science+Business Media, LLC.

Enhancing image registration performance by incorporating distribution and spatial distance of local descriptors

Authors: Lv, Guohua , Teng, Shyh , Lu, Guojun
Date: 2018
Type: Text , Journal article
Relation: Pattern Recognition Letters Vol. 103, no. (2018), p. 46-52
Full Text: false
Reviewed:
Description: A data dependency similarity measure called mp-dissimilarity has been recently proposed. Unlike ℓp-norm distance which is widely used in calculating the similarity between vectors, mp-dissimilarity takes into account the relative positions of the two vectors with respect to the rest of the data. This paper investigates the potential of mp-dissimilarity in matching local image descriptors. Moreover, three new matching strategies are proposed by considering both ℓp-norm distance and mp-dissimilarity. Our proposed matching strategies are extensively evaluated against ℓp-norm distance and mp-dissimilarity on a few benchmark datasets. Experimental results show that mp-dissimilarity is a promising alternative to ℓp-norm distance in matching local descriptors. The proposed matching strategies outperform both ℓp-norm distance and mp-dissimilarity in matching accuracy. One of our proposed matching strategies is comparable to ℓp-norm distance in terms of recall vs 1-precision. © 2018 Elsevier B.V.

Enhancing the effectiveness of local descriptor based image matching

Authors: Hossain, Md Tahmid , Teng, Shyh , Zhang, Dengsheng , Lim, Suryani , Lu, Guojun
Date: 2018
Type: Text , Conference proceedings , Conference paper
Relation: 2018 International Conference on Digital Image Computing: Techniques and Applications, DICTA 2018; Canberra, Australia; 10th-13th December 2018 p. 1-8
Full Text: false
Reviewed:
Description: Image registration has received great attention from researchers over the last few decades. SIFT (Scale Invariant Feature Transform), a local descriptor-based technique is widely used for registering and matching images. To establish correspondences between images, SIFT uses a Euclidean Distance ratio metric. However, this approach leads to a lot of incorrect matches and eliminating these inaccurate matches has been a challenge. Various methods have been proposed attempting to mitigate this problem. In this paper, we propose a scale and orientation harmony-based pruning method that improves image matching process by successfully eliminating incorrect SIFT descriptor matches. Moreover, our technique can predict the image transformation parameters based on a novel adaptive clustering method with much higher matching accuracy. Our experimental results have shown that the proposed method has achieved averages of approximately 16% and 10% higher matching accuracy compared to the traditional SIFT and a contemporary method respectively.
Description: 2018 International Conference on Digital Image Computing: Techniques and Applications, DICTA 2018

Image clustering using a similarity measure incorporating human perception

Authors: Shojanazeri, Hamid , Aryal, Sunil , Teng, Shyh , Zhang, Dengsheng , Lu, Guojun
Date: 2018
Type: Text , Conference proceedings , Conference paper
Relation: 2018 International Conference on Image and Vision Computing New Zealand, IVCNZ 2018; Auckland, New Zealand; 19th-21st November 2018 p. 1-6
Full Text: false
Reviewed:
Description: Clustering similar images is an important task in image processing and computer vision. It requires a measure to quantify pairwise similarities of images. The performance of clustering algorithm depends on the choice of similarity measure. In this paper, we investigate the effectiveness of data independent (distance-based), data-dependent (mass-based) and hybrid (dis)similarity measures in the image clustering task using three benchmark image collections with different sets of features. Our results of K-Medoids clustering show that uses the hybrid Perceptual Dissimilarity Measure (PMD) produces better clustering results than distance-based l(p) - norm and mass-based m(p) - dissimilarity.

A hybrid data dependent dissimilarity measure for image retrieval

Authors: Shojanazeri, Hamid , Teng, Shyh , Zhang, Dengsheng , Lu, Guojun
Date: 2017
Type: Text , Conference proceedings
Relation: 2017 International Conference on Digital Image Computing - Techniques and Applications (DICTA); Sydney, Australia; 29th November-1st December 2017 p. 141-148
Full Text: false
Reviewed:
Description: In image retrieval, an effective dissimilarity (or similarity) measure is required to retrieve the perceptually similar images. Minkowski-type distance is widely used for image retrieval, however it has its limitation. It focuses on distance between image features and ignores the data distribution of the image features, which can play an important role in measuring perceptual similarity of images. To address this limitation, a data dependent measure named m-p, which calculates the dissimilarity using the data distribution rather than geometric distance has been proposed recently. It considers two instances in a sparse region to be more similar than in a dense region. Relying only on data distribution and completely ignoring the geometric distance raise other limitations. This may result in finding two perceptually dissimilar instances similar due to being located in a sparse region or vice versa. We proposed a new hybrid dissimilarity measure and experimental results show that it addresses these limitations.

Cuboid segmentation for effective image retrieval

Authors: Murshed, Manzur , Teng, Shyh , Lu, Guojun
Date: 2017
Type: Text , Conference proceedings
Relation: 2017 International Conference on Digital Image Computing : Techniques and Applications (DICTA); Sydney, Australia; 29th November-1st December 2017 p. 884-891
Full Text: false
Reviewed:
Description: Region-based image retrieval has been proven to be effective in finding relevant images. In this paper, we propose a cuboid im-age segmentation method which results in rectangle image partitions. Rectangle partitions are more suitable for image compression, retrieval and other image operations. We apply partitions in image retrieval in this paper. Our experimental results have shown that (1) the proposed partitioning method is effective in segmenting images into meaningful rectangles; (2) using colour partitions for image retrieval is more effective than using whole images; and (3) the partitioned approach has additional advantage of letting users to select certain objects/colours as queries to find more relevant images/objects. These three advantages could be important in crime scene investigation image indexing and retrieval. Moreover, the proposed technique is amenable to compressed-domain applications.

Showing items 1 - 20 of 82