Bidirectional mapping coupled GAN for generalized zero-shot learning
- Shermin, Tasfia, Teng, Shyh, Sohel, Ferdous, Murshed, Manzur, Lu, Guojun
- Authors: Shermin, Tasfia , Teng, Shyh , Sohel, Ferdous , Murshed, Manzur , Lu, Guojun
- Date: 2022
- Type: Text , Journal article
- Relation: IEEE Transactions on Image Processing Vol. 31, no. (2022), p. 721-733
- Full Text:
- Reviewed:
- Description: Bidirectional mapping-based generalized zero-shot learning (GZSL) methods rely on the quality of synthesized features to recognize seen and unseen data. Therefore, learning a joint distribution of seen-unseen classes and preserving the distinction between seen-unseen classes is crucial for GZSL methods. However, existing methods only learn the underlying distribution of seen data, although unseen class semantics are available in the GZSL problem setting. Most methods neglect retaining seen-unseen classes distinction and use the learned distribution to recognize seen and unseen data. Consequently, they do not perform well. In this work, we utilize the available unseen class semantics alongside seen class semantics and learn joint distribution through a strong visual-semantic coupling. We propose a bidirectional mapping coupled generative adversarial network (BMCoGAN) by extending the concept of the coupled generative adversarial network into a bidirectional mapping model. We further integrate a Wasserstein generative adversarial optimization to supervise the joint distribution learning. We design a loss optimization for retaining distinctive information of seen-unseen classes in the synthesized features and reducing bias towards seen classes, which pushes synthesized seen features towards real seen features and pulls synthesized unseen features away from real seen features. We evaluate BMCoGAN on benchmark datasets and demonstrate its superior performance against contemporary methods. © 1992-2012 IEEE.
- Authors: Shermin, Tasfia , Teng, Shyh , Sohel, Ferdous , Murshed, Manzur , Lu, Guojun
- Date: 2022
- Type: Text , Journal article
- Relation: IEEE Transactions on Image Processing Vol. 31, no. (2022), p. 721-733
- Full Text:
- Reviewed:
- Description: Bidirectional mapping-based generalized zero-shot learning (GZSL) methods rely on the quality of synthesized features to recognize seen and unseen data. Therefore, learning a joint distribution of seen-unseen classes and preserving the distinction between seen-unseen classes is crucial for GZSL methods. However, existing methods only learn the underlying distribution of seen data, although unseen class semantics are available in the GZSL problem setting. Most methods neglect retaining seen-unseen classes distinction and use the learned distribution to recognize seen and unseen data. Consequently, they do not perform well. In this work, we utilize the available unseen class semantics alongside seen class semantics and learn joint distribution through a strong visual-semantic coupling. We propose a bidirectional mapping coupled generative adversarial network (BMCoGAN) by extending the concept of the coupled generative adversarial network into a bidirectional mapping model. We further integrate a Wasserstein generative adversarial optimization to supervise the joint distribution learning. We design a loss optimization for retaining distinctive information of seen-unseen classes in the synthesized features and reducing bias towards seen classes, which pushes synthesized seen features towards real seen features and pulls synthesized unseen features away from real seen features. We evaluate BMCoGAN on benchmark datasets and demonstrate its superior performance against contemporary methods. © 1992-2012 IEEE.
Few-shot image classification : current status and research trends
- Liu, Ying, Zhang, Hengchang, Zhang, Weidong, Lu, Guojun, Tian, Qi, Ling, Nam
- Authors: Liu, Ying , Zhang, Hengchang , Zhang, Weidong , Lu, Guojun , Tian, Qi , Ling, Nam
- Date: 2022
- Type: Text , Journal article , Review
- Relation: Electronics (Switzerland) Vol. 11, no. 11 (2022), p.
- Full Text:
- Reviewed:
- Description: Conventional image classification methods usually require a large number of training samples for the training model. However, in practical scenarios, the amount of available sample data is often insufficient, which easily leads to overfitting in network construction. Few-shot learning provides an effective solution to this problem and has been a hot research topic. This paper provides an intensive survey on the state-of-the-art techniques in image classification based on few-shot learning. According to the different deep learning mechanisms, the existing algorithms are di-vided into four categories: transfer learning based, meta-learning based, data augmentation based, and multimodal based methods. Transfer learning based methods transfer useful prior knowledge from the source domain to the target domain. Meta-learning based methods employ past prior knowledge to guide the learning of new tasks. Data augmentation based methods expand the amount of sample data with auxiliary information. Multimodal based methods use the information of the auxiliary modal to facilitate the implementation of image classification tasks. This paper also summarizes the few-shot image datasets available in the literature, and experimental results tested by some representative algorithms are provided to compare their performance and analyze their pros and cons. In addition, the application of existing research outcomes on few-shot image classification in different practical fields are discussed. Finally, a few future research directions are iden-tified. © 2022 by the authors. Licensee MDPI, Basel, Switzerland.
- Authors: Liu, Ying , Zhang, Hengchang , Zhang, Weidong , Lu, Guojun , Tian, Qi , Ling, Nam
- Date: 2022
- Type: Text , Journal article , Review
- Relation: Electronics (Switzerland) Vol. 11, no. 11 (2022), p.
- Full Text:
- Reviewed:
- Description: Conventional image classification methods usually require a large number of training samples for the training model. However, in practical scenarios, the amount of available sample data is often insufficient, which easily leads to overfitting in network construction. Few-shot learning provides an effective solution to this problem and has been a hot research topic. This paper provides an intensive survey on the state-of-the-art techniques in image classification based on few-shot learning. According to the different deep learning mechanisms, the existing algorithms are di-vided into four categories: transfer learning based, meta-learning based, data augmentation based, and multimodal based methods. Transfer learning based methods transfer useful prior knowledge from the source domain to the target domain. Meta-learning based methods employ past prior knowledge to guide the learning of new tasks. Data augmentation based methods expand the amount of sample data with auxiliary information. Multimodal based methods use the information of the auxiliary modal to facilitate the implementation of image classification tasks. This paper also summarizes the few-shot image datasets available in the literature, and experimental results tested by some representative algorithms are provided to compare their performance and analyze their pros and cons. In addition, the application of existing research outcomes on few-shot image classification in different practical fields are discussed. Finally, a few future research directions are iden-tified. © 2022 by the authors. Licensee MDPI, Basel, Switzerland.
Integrated generalized zero-shot learning for fine-grained classification
- Shermin, Tasfia, Teng, Shyh, Sohel, Ferdous, Murshed, Manzur, Lu, Guojun
- Authors: Shermin, Tasfia , Teng, Shyh , Sohel, Ferdous , Murshed, Manzur , Lu, Guojun
- Date: 2022
- Type: Text , Journal article
- Relation: Pattern Recognition Vol. 122, no. (2022), p.
- Full Text:
- Reviewed:
- Description: Embedding learning (EL) and feature synthesizing (FS) are two of the popular categories of fine-grained GZSL methods. EL or FS using global features cannot discriminate fine details in the absence of local features. On the other hand, EL or FS methods exploiting local features either neglect direct attribute guidance or global information. Consequently, neither method performs well. In this paper, we propose to explore global and direct attribute-supervised local visual features for both EL and FS categories in an integrated manner for fine-grained GZSL. The proposed integrated network has an EL sub-network and a FS sub-network. Consequently, the proposed integrated network can be tested in two ways. We propose a novel two-step dense attention mechanism to discover attribute-guided local visual features. We introduce new mutual learning between the sub-networks to exploit mutually beneficial information for optimization. Moreover, we propose to compute source-target class similarity based on mutual information and transfer-learn the target classes to reduce bias towards the source domain during testing. We demonstrate that our proposed method outperforms contemporary methods on benchmark datasets. © 2021 Elsevier Ltd
- Authors: Shermin, Tasfia , Teng, Shyh , Sohel, Ferdous , Murshed, Manzur , Lu, Guojun
- Date: 2022
- Type: Text , Journal article
- Relation: Pattern Recognition Vol. 122, no. (2022), p.
- Full Text:
- Reviewed:
- Description: Embedding learning (EL) and feature synthesizing (FS) are two of the popular categories of fine-grained GZSL methods. EL or FS using global features cannot discriminate fine details in the absence of local features. On the other hand, EL or FS methods exploiting local features either neglect direct attribute guidance or global information. Consequently, neither method performs well. In this paper, we propose to explore global and direct attribute-supervised local visual features for both EL and FS categories in an integrated manner for fine-grained GZSL. The proposed integrated network has an EL sub-network and a FS sub-network. Consequently, the proposed integrated network can be tested in two ways. We propose a novel two-step dense attention mechanism to discover attribute-guided local visual features. We introduce new mutual learning between the sub-networks to exploit mutually beneficial information for optimization. Moreover, we propose to compute source-target class similarity based on mutual information and transfer-learn the target classes to reduce bias towards the source domain during testing. We demonstrate that our proposed method outperforms contemporary methods on benchmark datasets. © 2021 Elsevier Ltd
A novel fusion approach in the extraction of kernel descriptor with improved effectiveness and efficiency
- Karmakar, Priyabrata, Teng, Shyh, Lu, Guojun, Zhang, Dengsheng
- Authors: Karmakar, Priyabrata , Teng, Shyh , Lu, Guojun , Zhang, Dengsheng
- Date: 2021
- Type: Text , Journal article
- Relation: Multimedia Tools and Applications Vol. 80, no. 10 (Apr 2021), p. 14545-14564
- Full Text:
- Reviewed:
- Description: Image representation using feature descriptors is crucial. A number of histogram-based descriptors are widely used for this purpose. However, histogram-based descriptors have certain limitations and kernel descriptors (KDES) are proven to overcome them. Moreover, the combination of more than one KDES performs better than an individual KDES. Conventionally, KDES fusion is performed by concatenating them after the gradient, colour and shape descriptors have been extracted. This approach has limitations in regard to the efficiency as well as the effectiveness. In this paper, we propose a novel approach to fuse different image features before the descriptor extraction, resulting in a compact descriptor which is efficient and effective. In addition, we have investigated the effect on the proposed descriptor when texture-based features are fused along with the conventionally used features. Our proposed descriptor is examined on two publicly available image databases and shown to provide outstanding performances.
- Authors: Karmakar, Priyabrata , Teng, Shyh , Lu, Guojun , Zhang, Dengsheng
- Date: 2021
- Type: Text , Journal article
- Relation: Multimedia Tools and Applications Vol. 80, no. 10 (Apr 2021), p. 14545-14564
- Full Text:
- Reviewed:
- Description: Image representation using feature descriptors is crucial. A number of histogram-based descriptors are widely used for this purpose. However, histogram-based descriptors have certain limitations and kernel descriptors (KDES) are proven to overcome them. Moreover, the combination of more than one KDES performs better than an individual KDES. Conventionally, KDES fusion is performed by concatenating them after the gradient, colour and shape descriptors have been extracted. This approach has limitations in regard to the efficiency as well as the effectiveness. In this paper, we propose a novel approach to fuse different image features before the descriptor extraction, resulting in a compact descriptor which is efficient and effective. In addition, we have investigated the effect on the proposed descriptor when texture-based features are fused along with the conventionally used features. Our proposed descriptor is examined on two publicly available image databases and shown to provide outstanding performances.
Adversarial network with multiple classifiers for open set domain adaptation
- Shermin, Tasfia, Lu, Guojun, Teng, Shyh, Murshed, Manzur, Sohel, Ferdous
- Authors: Shermin, Tasfia , Lu, Guojun , Teng, Shyh , Murshed, Manzur , Sohel, Ferdous
- Date: 2021
- Type: Text , Journal article
- Relation: IEEE Transactions on Multimedia Vol. 23, no. (2021), p. 2732-2744
- Full Text:
- Reviewed:
- Description: Domain adaptation aims to transfer knowledge from a domain with adequate labeled samples to a domain with scarce labeled samples. Prior research has introduced various open set domain adaptation settings in the literature to extend the applications of domain adaptation methods in real-world scenarios. This paper focuses on the type of open set domain adaptation setting where the target domain has both private ('unknown classes') label space and the shared ('known classes') label space. However, the source domain only has the 'known classes' label space. Prevalent distribution-matching domain adaptation methods are inadequate in such a setting that demands adaptation from a smaller source domain to a larger and diverse target domain with more classes. For addressing this specific open set domain adaptation setting, prior research introduces a domain adversarial model that uses a fixed threshold for distinguishing known from unknown target samples and lacks at handling negative transfers. We extend their adversarial model and propose a novel adversarial domain adaptation model with multiple auxiliary classifiers. The proposed multi-classifier structure introduces a weighting module that evaluates distinctive domain characteristics for assigning the target samples with weights which are more representative to whether they are likely to belong to the known and unknown classes to encourage positive transfers during adversarial training and simultaneously reduces the domain gap between the shared classes of the source and target domains. A thorough experimental investigation shows that our proposed method outperforms existing domain adaptation methods on a number of domain adaptation datasets. © 1999-2012 IEEE.
- Authors: Shermin, Tasfia , Lu, Guojun , Teng, Shyh , Murshed, Manzur , Sohel, Ferdous
- Date: 2021
- Type: Text , Journal article
- Relation: IEEE Transactions on Multimedia Vol. 23, no. (2021), p. 2732-2744
- Full Text:
- Reviewed:
- Description: Domain adaptation aims to transfer knowledge from a domain with adequate labeled samples to a domain with scarce labeled samples. Prior research has introduced various open set domain adaptation settings in the literature to extend the applications of domain adaptation methods in real-world scenarios. This paper focuses on the type of open set domain adaptation setting where the target domain has both private ('unknown classes') label space and the shared ('known classes') label space. However, the source domain only has the 'known classes' label space. Prevalent distribution-matching domain adaptation methods are inadequate in such a setting that demands adaptation from a smaller source domain to a larger and diverse target domain with more classes. For addressing this specific open set domain adaptation setting, prior research introduces a domain adversarial model that uses a fixed threshold for distinguishing known from unknown target samples and lacks at handling negative transfers. We extend their adversarial model and propose a novel adversarial domain adaptation model with multiple auxiliary classifiers. The proposed multi-classifier structure introduces a weighting module that evaluates distinctive domain characteristics for assigning the target samples with weights which are more representative to whether they are likely to belong to the known and unknown classes to encourage positive transfers during adversarial training and simultaneously reduces the domain gap between the shared classes of the source and target domains. A thorough experimental investigation shows that our proposed method outperforms existing domain adaptation methods on a number of domain adaptation datasets. © 1999-2012 IEEE.
Robust image classification using a low-pass activation function and DCT augmentation
- Hossain, Md Tahmid, Teng, Shyh, Sohel, Ferdous, Lu, Guojun
- Authors: Hossain, Md Tahmid , Teng, Shyh , Sohel, Ferdous , Lu, Guojun
- Date: 2021
- Type: Text , Journal article
- Relation: IEEE Access Vol. 9, no. (2021), p. 86460-86474
- Full Text:
- Reviewed:
- Description: Convolutional Neural Network's (CNN's) performance disparity on clean and corrupted datasets has recently come under scrutiny. In this work, we analyse common corruptions in the frequency domain, i.e., High Frequency corruptions (HFc, e.g., noise) and Low Frequency corruptions (LFc, e.g., blur). Although a simple solution to HFc is low-pass filtering, ReLU - a widely used Activation Function (AF), does not have any filtering mechanism. In this work, we instill low-pass filtering into the AF (LP-ReLU) to improve robustness against HFc. To deal with LFc, we complement LP-ReLU with Discrete Cosine Transform based augmentation. LP-ReLU, coupled with DCT augmentation, enables a deep network to tackle the entire spectrum of corruption. We use CIFAR-10-C and Tiny ImageNet-C for evaluation and demonstrate improvements of 5% and 7.3% in accuracy respectively, compared to the State-Of-The-Art (SOTA). We further evaluate our method's stability on a variety of perturbations in CIFAR-10-P and Tiny ImageNet-P, achieving new SOTA in these experiments as well. To further strengthen our understanding regarding CNN's lack of robustness, a decision space visualisation process is proposed and presented in this work. © 2013 IEEE.
- Authors: Hossain, Md Tahmid , Teng, Shyh , Sohel, Ferdous , Lu, Guojun
- Date: 2021
- Type: Text , Journal article
- Relation: IEEE Access Vol. 9, no. (2021), p. 86460-86474
- Full Text:
- Reviewed:
- Description: Convolutional Neural Network's (CNN's) performance disparity on clean and corrupted datasets has recently come under scrutiny. In this work, we analyse common corruptions in the frequency domain, i.e., High Frequency corruptions (HFc, e.g., noise) and Low Frequency corruptions (LFc, e.g., blur). Although a simple solution to HFc is low-pass filtering, ReLU - a widely used Activation Function (AF), does not have any filtering mechanism. In this work, we instill low-pass filtering into the AF (LP-ReLU) to improve robustness against HFc. To deal with LFc, we complement LP-ReLU with Discrete Cosine Transform based augmentation. LP-ReLU, coupled with DCT augmentation, enables a deep network to tackle the entire spectrum of corruption. We use CIFAR-10-C and Tiny ImageNet-C for evaluation and demonstrate improvements of 5% and 7.3% in accuracy respectively, compared to the State-Of-The-Art (SOTA). We further evaluate our method's stability on a variety of perturbations in CIFAR-10-P and Tiny ImageNet-P, achieving new SOTA in these experiments as well. To further strengthen our understanding regarding CNN's lack of robustness, a decision space visualisation process is proposed and presented in this work. © 2013 IEEE.
A pupil-positioning method based on the starburst model
- Yu, Pingping, Duan, Wenjie, Sun, Yi, Cao, Ning, Wang, Zhenzhou, Lu, Guojun
- Authors: Yu, Pingping , Duan, Wenjie , Sun, Yi , Cao, Ning , Wang, Zhenzhou , Lu, Guojun
- Date: 2020
- Type: Text , Journal article
- Relation: Cmc-Computers Materials & Continua Vol. 64, no. 2 (2020), p. 1199-1217
- Full Text:
- Reviewed:
- Description: Human eye detection has become an area of interest in the field of computer vision with an extensive range of applications in human-computer interaction, disease diagnosis, and psychological and physiological studies. Gaze-tracking systems are an important research topic in the human-computer interaction field. As one of the core modules of the head-mounted gaze-tracking system, pupil positioning affects the accuracy and stability of the system. By tracking eye movements to better locate the center of the pupil, this paper proposes a method for pupil positioning based on the starburst model. The method uses vertical and horizontal coordinate integral projections in the rectangular region of the human eye for accurate positioning and applies a linear interpolation method that is based on a circular model to the reflections in the human eye. In this paper, we propose a method for detecting the feature points of the pupil edge based on the starburst model, which clusters feature points and uses the RANdom SAmple Consensus (RANSAC) algorithm to perform ellipse fitting of the pupil edge to accurately locate the pupil center. Our experimental results show that the algorithm has higher precision, higher efficiency and more robustness than other algorithms and excellent accuracy even when the image of the pupil is incomplete.
- Description: Science and Technology Support Plan Project of Hebei Province (grant numbers 17210803D and 19273703D Science and Technology Spark Project of the Hebei Seismological Bureau (grant number DZ20180402056) Education Department of Hebei Province (grant number QN2018095) Polytechnic College of Hebei University of Science and Technology
- Authors: Yu, Pingping , Duan, Wenjie , Sun, Yi , Cao, Ning , Wang, Zhenzhou , Lu, Guojun
- Date: 2020
- Type: Text , Journal article
- Relation: Cmc-Computers Materials & Continua Vol. 64, no. 2 (2020), p. 1199-1217
- Full Text:
- Reviewed:
- Description: Human eye detection has become an area of interest in the field of computer vision with an extensive range of applications in human-computer interaction, disease diagnosis, and psychological and physiological studies. Gaze-tracking systems are an important research topic in the human-computer interaction field. As one of the core modules of the head-mounted gaze-tracking system, pupil positioning affects the accuracy and stability of the system. By tracking eye movements to better locate the center of the pupil, this paper proposes a method for pupil positioning based on the starburst model. The method uses vertical and horizontal coordinate integral projections in the rectangular region of the human eye for accurate positioning and applies a linear interpolation method that is based on a circular model to the reflections in the human eye. In this paper, we propose a method for detecting the feature points of the pupil edge based on the starburst model, which clusters feature points and uses the RANdom SAmple Consensus (RANSAC) algorithm to perform ellipse fitting of the pupil edge to accurately locate the pupil center. Our experimental results show that the algorithm has higher precision, higher efficiency and more robustness than other algorithms and excellent accuracy even when the image of the pupil is incomplete.
- Description: Science and Technology Support Plan Project of Hebei Province (grant numbers 17210803D and 19273703D Science and Technology Spark Project of the Hebei Seismological Bureau (grant number DZ20180402056) Education Department of Hebei Province (grant number QN2018095) Polytechnic College of Hebei University of Science and Technology
An enhancement to the spatial pyramid matching for image classification and retrieval
- Karmakar, Priyabrata, Teng, Shyh, Lu, Guojun, Zhang, Dengsheng
- Authors: Karmakar, Priyabrata , Teng, Shyh , Lu, Guojun , Zhang, Dengsheng
- Date: 2020
- Type: Text , Journal article
- Relation: IEEE Access Vol. 8, no. (2020), p. 22463-22472
- Full Text:
- Reviewed:
- Description: Spatial pyramid matching (SPM) is one of the widely used methods to incorporate spatial information into the image representation. Despite its effectiveness, the traditional SPM is not rotation invariant. A rotation invariant SPM has been proposed in the literature but it has many limitations regarding the effectiveness. In this paper, we investigate how to make SPM robust to rotation by addressing those limitations. In an SPM framework, an image is divided into an increasing number of partitions at different pyramid levels. In this paper, our main focus is on how to partition images in such a way that the resulting structure can deal with image-level rotations. To do that, we investigate three concentric ring partitioning schemes. Apart from image partitioning, another important component of the SPM framework is a weight function. To apportion the contribution of each pyramid level to the final matching between two images, the weight function is needed. In this paper, we propose a new weight function which is suitable for the rotation-invariant SPM structure. Experiments based on image classification and retrieval are performed on five image databases. The detailed result analysis shows that we are successful in enhancing the effectiveness of SPM for image classification and retrieval. © 2013 IEEE.
- Authors: Karmakar, Priyabrata , Teng, Shyh , Lu, Guojun , Zhang, Dengsheng
- Date: 2020
- Type: Text , Journal article
- Relation: IEEE Access Vol. 8, no. (2020), p. 22463-22472
- Full Text:
- Reviewed:
- Description: Spatial pyramid matching (SPM) is one of the widely used methods to incorporate spatial information into the image representation. Despite its effectiveness, the traditional SPM is not rotation invariant. A rotation invariant SPM has been proposed in the literature but it has many limitations regarding the effectiveness. In this paper, we investigate how to make SPM robust to rotation by addressing those limitations. In an SPM framework, an image is divided into an increasing number of partitions at different pyramid levels. In this paper, our main focus is on how to partition images in such a way that the resulting structure can deal with image-level rotations. To do that, we investigate three concentric ring partitioning schemes. Apart from image partitioning, another important component of the SPM framework is a weight function. To apportion the contribution of each pyramid level to the final matching between two images, the weight function is needed. In this paper, we propose a new weight function which is suitable for the rotation-invariant SPM structure. Experiments based on image classification and retrieval are performed on five image databases. The detailed result analysis shows that we are successful in enhancing the effectiveness of SPM for image classification and retrieval. © 2013 IEEE.
Voxel-based extraction of individual pylons and wires from lidar point cloud data
- Munir, Nosheen, Awrangjeb, Mohammad, Stantic, Bela, Lu, Guojun, Islam, Syed
- Authors: Munir, Nosheen , Awrangjeb, Mohammad , Stantic, Bela , Lu, Guojun , Islam, Syed
- Date: 2019
- Type: Text , Journal article
- Relation: ISPRS annals of the photogrammetry, remote sensing and spatial information sciences Vol. IV-4/W8, no. (2019), p. 91-98
- Full Text:
- Reviewed:
- Description: Extraction of individual pylons and wires is important for modelling of 3D objects in a power line corridor (PLC) map. However, the existing methods mostly classify points into distinct classes like pylons and wires, but hardly into individual pylons or wires. The proposed method extracts standalone pylons, vegetation and wires from LiDAR data. The extraction of individual objects is needed for a detailed PLC mapping. The proposed approach starts off with the separation of ground and non ground points. The non-ground points are then classified into vertical (e.g., pylons and vegetation) and non-vertical (e.g., wires) object points using the vertical profile feature (VPF) through the binary support vector machine (SVM) classifier. Individual pylons and vegetation are then separated using their shape and area properties. The locations of pylons are further used to extract the span points between two successive pylons. Finally, span points are voxelised and alignment properties of wires in the voxel grid is used to extract individual wires points. The results are evaluated on dataset which has multiple spans with bundled wires in each span. The evaluation results show that the proposed method and features are very effective for extraction of individual wires, pylons and vegetation with 99% correctness and 98% completeness.
- Authors: Munir, Nosheen , Awrangjeb, Mohammad , Stantic, Bela , Lu, Guojun , Islam, Syed
- Date: 2019
- Type: Text , Journal article
- Relation: ISPRS annals of the photogrammetry, remote sensing and spatial information sciences Vol. IV-4/W8, no. (2019), p. 91-98
- Full Text:
- Reviewed:
- Description: Extraction of individual pylons and wires is important for modelling of 3D objects in a power line corridor (PLC) map. However, the existing methods mostly classify points into distinct classes like pylons and wires, but hardly into individual pylons or wires. The proposed method extracts standalone pylons, vegetation and wires from LiDAR data. The extraction of individual objects is needed for a detailed PLC mapping. The proposed approach starts off with the separation of ground and non ground points. The non-ground points are then classified into vertical (e.g., pylons and vegetation) and non-vertical (e.g., wires) object points using the vertical profile feature (VPF) through the binary support vector machine (SVM) classifier. Individual pylons and vegetation are then separated using their shape and area properties. The locations of pylons are further used to extract the span points between two successive pylons. Finally, span points are voxelised and alignment properties of wires in the voxel grid is used to extract individual wires points. The results are evaluated on dataset which has multiple spans with bundled wires in each span. The evaluation results show that the proposed method and features are very effective for extraction of individual wires, pylons and vegetation with 99% correctness and 98% completeness.
A new image dissimilarity measure incorporating human perception
- Shojanazeri, Hamid, Teng, Shyh, Aryal, Sunil, Zhang, Dengsheng, Lu, Guojun
- Authors: Shojanazeri, Hamid , Teng, Shyh , Aryal, Sunil , Zhang, Dengsheng , Lu, Guojun
- Date: 2018
- Type: Text , Unpublished work
- Full Text:
- Description: Pairwise (dis) similarity measure of data objects is central to many applications of image anlaytics, such as image retrieval and classification. Geometric distance, particularly Euclidean distance ((
- Authors: Shojanazeri, Hamid , Teng, Shyh , Aryal, Sunil , Zhang, Dengsheng , Lu, Guojun
- Date: 2018
- Type: Text , Unpublished work
- Full Text:
- Description: Pairwise (dis) similarity measure of data objects is central to many applications of image anlaytics, such as image retrieval and classification. Geometric distance, particularly Euclidean distance ((
An Attention-Based Approach for Single Image Super Resolution
- Liu, Yuan, Wang, Yuancheng, Li, Nan, Cheng, Xu, Zhang, Yifeng, Huang, Yongming, Lu, Guojun
- Authors: Liu, Yuan , Wang, Yuancheng , Li, Nan , Cheng, Xu , Zhang, Yifeng , Huang, Yongming , Lu, Guojun
- Date: 2018
- Type: Text , Conference proceedings , Conference paper
- Relation: 2018 24th International Conference on Pattern Recognition, ICPR 2018; Beijing, China; 20th-24th August 2018 Vol. 2018, p. 2777-2784
- Full Text:
- Reviewed:
- Description: The main challenge of single image super resolution (SISR) is the recovery of high frequency details such as tiny textures. However, most of the state-of-the-art methods lack specific modules to identify high frequency areas, causing the output image to be blurred. We propose an attention-based approach to give a discrimination between texture areas and smooth areas. After the positions of high frequency details are located, high frequency compensation is carried out. This approach can incorporate with previously proposed SISR networks. By providing high frequency enhancement, better performance and visual effect are achieved. We also propose our own SISR network composed of DenseRes blocks. The block provides an effective way to combine the low level features and high level features. Extensive benchmark evaluation shows that our proposed method achieves significant improvement over the state-of-the-art works in SISR.
- Authors: Liu, Yuan , Wang, Yuancheng , Li, Nan , Cheng, Xu , Zhang, Yifeng , Huang, Yongming , Lu, Guojun
- Date: 2018
- Type: Text , Conference proceedings , Conference paper
- Relation: 2018 24th International Conference on Pattern Recognition, ICPR 2018; Beijing, China; 20th-24th August 2018 Vol. 2018, p. 2777-2784
- Full Text:
- Reviewed:
- Description: The main challenge of single image super resolution (SISR) is the recovery of high frequency details such as tiny textures. However, most of the state-of-the-art methods lack specific modules to identify high frequency areas, causing the output image to be blurred. We propose an attention-based approach to give a discrimination between texture areas and smooth areas. After the positions of high frequency details are located, high frequency compensation is carried out. This approach can incorporate with previously proposed SISR networks. By providing high frequency enhancement, better performance and visual effect are achieved. We also propose our own SISR network composed of DenseRes blocks. The block provides an effective way to combine the low level features and high level features. Extensive benchmark evaluation shows that our proposed method achieves significant improvement over the state-of-the-art works in SISR.
Enhanced colour image retrieval with cuboid segmentation
- Murshed, Manzur, Karmakar, Priyabrata, Teng, Shyh, Lu, Guojun
- Authors: Murshed, Manzur , Karmakar, Priyabrata , Teng, Shyh , Lu, Guojun
- Date: 2018
- Type: Text , Conference proceedings , Conference paper
- Relation: 2018 International Conference on Digital Image Computing: Techniques and Applications, DICTA 2018; Canberra, Australia; 10th-13th December 2018
- Full Text:
- Reviewed:
- Description: In this paper, we further investigate our recently proposed cuboid image segmentation algorithm for effective image retrieval. Instead of using all cuboids (i.e. segments), we have proposed two approaches to choose different subsets of cuboids appropriately. With the experimental results on eBay dataset, we have shown that our proposals outperform retrieval performance of the existing technique. In addition, we have investigated how many segments are required for the most effective image retrieval and provide a quick method to determine the suitable number of cuboids.
- Description: 2018 International Conference on Digital Image Computing: Techniques and Applications, DICTA 2018
- Authors: Murshed, Manzur , Karmakar, Priyabrata , Teng, Shyh , Lu, Guojun
- Date: 2018
- Type: Text , Conference proceedings , Conference paper
- Relation: 2018 International Conference on Digital Image Computing: Techniques and Applications, DICTA 2018; Canberra, Australia; 10th-13th December 2018
- Full Text:
- Reviewed:
- Description: In this paper, we further investigate our recently proposed cuboid image segmentation algorithm for effective image retrieval. Instead of using all cuboids (i.e. segments), we have proposed two approaches to choose different subsets of cuboids appropriately. With the experimental results on eBay dataset, we have shown that our proposals outperform retrieval performance of the existing technique. In addition, we have investigated how many segments are required for the most effective image retrieval and provide a quick method to determine the suitable number of cuboids.
- Description: 2018 International Conference on Digital Image Computing: Techniques and Applications, DICTA 2018
Segmentation of airborne point cloud data for automatic building roof extraction
- Gilani, Syed, Awrangjeb, Mohammad, Lu, Guojun
- Authors: Gilani, Syed , Awrangjeb, Mohammad , Lu, Guojun
- Date: 2018
- Type: Text , Journal article
- Relation: GIScience & Remote Sensing Vol. 55, no. 1 (2018), p. 63-89
- Full Text:
- Reviewed:
- Description: Roof plane segmentation is a complex task since point cloud data carry no connection information and do not provide any semantic characteristics of the underlying scanned surfaces. Point cloud density, complex roof profiles, and occlusion add another layer of complexity which often encounter in practice. In this article, we present a new technique that provides a better interpolation of roof regions where multiple surfaces intersect creating non-manifold points. As a result, these geometric features are preserved to achieve automated identification and segmentation of the roof planes from unstructured laser data. The proposed technique has been tested using the International Society for Photogrammetry and Remote Sensing benchmark and three Australian datasets, which differ in terrain, point density, building sizes, and vegetation. The qualitative and quantitative results show the robustness of the methodology and indicate that the proposed technique can eliminate vegetation and extract buildings as well as their non-occluding parts from the complex scenes at a high success rate for building detection (between 83.9% and 100% per-object completeness) and roof plane extraction (between 73.9% and 96% per-object completeness). The proposed method works more robustly than some existing methods in the presence of occlusion and low point sampling as indicated by the correctness of above 95% for all the datasets.
- Authors: Gilani, Syed , Awrangjeb, Mohammad , Lu, Guojun
- Date: 2018
- Type: Text , Journal article
- Relation: GIScience & Remote Sensing Vol. 55, no. 1 (2018), p. 63-89
- Full Text:
- Reviewed:
- Description: Roof plane segmentation is a complex task since point cloud data carry no connection information and do not provide any semantic characteristics of the underlying scanned surfaces. Point cloud density, complex roof profiles, and occlusion add another layer of complexity which often encounter in practice. In this article, we present a new technique that provides a better interpolation of roof regions where multiple surfaces intersect creating non-manifold points. As a result, these geometric features are preserved to achieve automated identification and segmentation of the roof planes from unstructured laser data. The proposed technique has been tested using the International Society for Photogrammetry and Remote Sensing benchmark and three Australian datasets, which differ in terrain, point density, building sizes, and vegetation. The qualitative and quantitative results show the robustness of the methodology and indicate that the proposed technique can eliminate vegetation and extract buildings as well as their non-occluding parts from the complex scenes at a high success rate for building detection (between 83.9% and 100% per-object completeness) and roof plane extraction (between 73.9% and 96% per-object completeness). The proposed method works more robustly than some existing methods in the presence of occlusion and low point sampling as indicated by the correctness of above 95% for all the datasets.
A Hybrid data dependent dissimilarity measure for image retrieval
- Shojanazeri, Hamid, Teng, Shyh, Lu, Guojun
- Authors: Shojanazeri, Hamid , Teng, Shyh , Lu, Guojun
- Date: 2017
- Type: Text , Unpublished work
- Full Text:
- Description: Abstract— In image retrieval, an effective dissimilarity measure is required to retrieve the perceptually similar images. Minkowski-type (lp ) distance is widely used for image retrieval, however it has its limitations. It focuses on distance between image features and ignores the data distribution of the image features, which can play an important role in measuring perceptual similarity of images. !! also favours the most dominant components in calculating the total dissimilarity. A data dependent measure, named !! -dissimilarity, which estimates the dissimilarity using the data distribution, has been proposed recently. Rather than relying on geometric distance, it measures the dissimilarity between two instances in each dimension as a probability mass in a region that encloses the two instances. It considers two instances in a sparse region to be more similar than in a dense region. Using the probability of data mass enables all the dimensions of feature vectors to contribute in the final estimate of dissimilarity, so it does not just heavily bias towards the most dominant components. However, relying only on data distribution and completely ignoring the geometric distance raise another limitation. This can result in finding two instances similar only due to being in a sparse region, however if the geometric distance between them is large then they are not perceptually similar. To address this limitation we proposed a new hybrid data dependent dissimilarity (HDDD) measure that considers both data distribution as well as geometric distance. Our experimental results using Corel database and Caltech 101 show that (HDDD) leads to higher image retrieval performance than lp distance (lpD) and mp.
- Authors: Shojanazeri, Hamid , Teng, Shyh , Lu, Guojun
- Date: 2017
- Type: Text , Unpublished work
- Full Text:
- Description: Abstract— In image retrieval, an effective dissimilarity measure is required to retrieve the perceptually similar images. Minkowski-type (lp ) distance is widely used for image retrieval, however it has its limitations. It focuses on distance between image features and ignores the data distribution of the image features, which can play an important role in measuring perceptual similarity of images. !! also favours the most dominant components in calculating the total dissimilarity. A data dependent measure, named !! -dissimilarity, which estimates the dissimilarity using the data distribution, has been proposed recently. Rather than relying on geometric distance, it measures the dissimilarity between two instances in each dimension as a probability mass in a region that encloses the two instances. It considers two instances in a sparse region to be more similar than in a dense region. Using the probability of data mass enables all the dimensions of feature vectors to contribute in the final estimate of dissimilarity, so it does not just heavily bias towards the most dominant components. However, relying only on data distribution and completely ignoring the geometric distance raise another limitation. This can result in finding two instances similar only due to being in a sparse region, however if the geometric distance between them is large then they are not perceptually similar. To address this limitation we proposed a new hybrid data dependent dissimilarity (HDDD) measure that considers both data distribution as well as geometric distance. Our experimental results using Corel database and Caltech 101 show that (HDDD) leads to higher image retrieval performance than lp distance (lpD) and mp.
A new building mask using the gradient of heights for automatic building extraction
- Siddiqui, Fasahat, Awrangjeb, Mohammad, Teng, Shyh, Lu, Guojun
- Authors: Siddiqui, Fasahat , Awrangjeb, Mohammad , Teng, Shyh , Lu, Guojun
- Date: 2016
- Type: Text , Conference proceedings
- Relation: 2016 International Conference on Digital Image Computing: Techniques and Applications (Dicta); Gold Coast, Australia; 30th November-2nd December 2016 p. 288-294
- Full Text:
- Reviewed:
- Description: A number of building detection methods have been proposed in the literature. However, they are not effective in detecting small buildings (typically, 50 m(2)) and buildings with transparent roof due to the way area thresholds and ground points are used. This paper proposes a new building mask to overcome these limitations and enables detection of buildings not only with transparent roof materials but also which are small in size. The proposed building detection method transforms the non-ground height information into an intensity image and then analyses the gradient information in the image. It uses a small area threshold of 1 m2 and, thereby, is able to detect small buildings such as garden sheds. The use of non-ground points allows analyses of the gradient on all types of roof materials and, thus, the method is also able to detect buildings with transparent roofs. Our experimental results show that the proposed method can successfully extract buildings even when their roofs are small and/or transparent, thereby, achieving relatively higher average completeness and quality.
- Authors: Siddiqui, Fasahat , Awrangjeb, Mohammad , Teng, Shyh , Lu, Guojun
- Date: 2016
- Type: Text , Conference proceedings
- Relation: 2016 International Conference on Digital Image Computing: Techniques and Applications (Dicta); Gold Coast, Australia; 30th November-2nd December 2016 p. 288-294
- Full Text:
- Reviewed:
- Description: A number of building detection methods have been proposed in the literature. However, they are not effective in detecting small buildings (typically, 50 m(2)) and buildings with transparent roof due to the way area thresholds and ground points are used. This paper proposes a new building mask to overcome these limitations and enables detection of buildings not only with transparent roof materials but also which are small in size. The proposed building detection method transforms the non-ground height information into an intensity image and then analyses the gradient information in the image. It uses a small area threshold of 1 m2 and, thereby, is able to detect small buildings such as garden sheds. The use of non-ground points allows analyses of the gradient on all types of roof materials and, thus, the method is also able to detect buildings with transparent roofs. Our experimental results show that the proposed method can successfully extract buildings even when their roofs are small and/or transparent, thereby, achieving relatively higher average completeness and quality.
A robust gradient based method for building extraction from LiDAR and photogrammetric imagery
- Siddiqui, Fasahat, Teng, Shyh, Awrangjeb, Mohammad, Lu, Guojun
- Authors: Siddiqui, Fasahat , Teng, Shyh , Awrangjeb, Mohammad , Lu, Guojun
- Date: 2016
- Type: Text , Journal article
- Relation: Sensors (Switzerland) Vol. 16, no. 7 (2016), p. 1-24
- Full Text:
- Reviewed:
- Description: Existing automatic building extraction methods are not effective in extracting buildings which are small in size and have transparent roofs. The application of large area threshold prohibits detection of small buildings and the use of ground points in generating the building mask prevents detection of transparent buildings. In addition, the existingmethods use numerous parameters to extract buildings in complex environments, e.g.,hilly area and high vegetation. However, the empirical tuning of large number of parameters reduces the robustness of building extraction methods. This paper proposes a novel Gradient-based Building Extraction (GBE) method to address these limitations. The proposed method transforms the Light Detection And Ranging (LiDAR) height information into intensity image without interpolation of point heights and then analyses the gradient information in the image. Generally, building roof planes have a constant height change along the slope of a roof plane whereas trees have a random height change. With such an analysis, buildings of a greater range of sizes with a transparent or opaque roof can be extracted. In addition, a local colour matching approach is introduced as a post-processing stage to eliminate trees. This stage of our proposed method does not require any manual setting and all parameters are set automatically from the data. The other post processing stages including variance, point density and shadow elimination are also applied to verify the extracted buildings, where comparatively fewer empirically set parameters are used. The performance of the proposed GBE method is evaluated on two benchmark data sets by using the object and pixel based metrics (completeness, correctness and quality). Our experimental results show the effectiveness of the proposed method in eliminating trees, extracting buildings of all sizes, and extracting buildings with and without transparent roof. When compared with current state-of-the-art building extraction methods, the proposed method outperforms the existing methods in various evaluation metrics. © 2016 by the authors; licensee MDPI, Basel, Switzerland.
- Authors: Siddiqui, Fasahat , Teng, Shyh , Awrangjeb, Mohammad , Lu, Guojun
- Date: 2016
- Type: Text , Journal article
- Relation: Sensors (Switzerland) Vol. 16, no. 7 (2016), p. 1-24
- Full Text:
- Reviewed:
- Description: Existing automatic building extraction methods are not effective in extracting buildings which are small in size and have transparent roofs. The application of large area threshold prohibits detection of small buildings and the use of ground points in generating the building mask prevents detection of transparent buildings. In addition, the existingmethods use numerous parameters to extract buildings in complex environments, e.g.,hilly area and high vegetation. However, the empirical tuning of large number of parameters reduces the robustness of building extraction methods. This paper proposes a novel Gradient-based Building Extraction (GBE) method to address these limitations. The proposed method transforms the Light Detection And Ranging (LiDAR) height information into intensity image without interpolation of point heights and then analyses the gradient information in the image. Generally, building roof planes have a constant height change along the slope of a roof plane whereas trees have a random height change. With such an analysis, buildings of a greater range of sizes with a transparent or opaque roof can be extracted. In addition, a local colour matching approach is introduced as a post-processing stage to eliminate trees. This stage of our proposed method does not require any manual setting and all parameters are set automatically from the data. The other post processing stages including variance, point density and shadow elimination are also applied to verify the extracted buildings, where comparatively fewer empirically set parameters are used. The performance of the proposed GBE method is evaluated on two benchmark data sets by using the object and pixel based metrics (completeness, correctness and quality). Our experimental results show the effectiveness of the proposed method in eliminating trees, extracting buildings of all sizes, and extracting buildings with and without transparent roof. When compared with current state-of-the-art building extraction methods, the proposed method outperforms the existing methods in various evaluation metrics. © 2016 by the authors; licensee MDPI, Basel, Switzerland.
An automatic building extraction and regularisation technique using LiDAR point cloud data and orthoimage
- Gilani, Sayed Ali Naqi, Awrangjeb, Mohammad, Lu, Guojun
- Authors: Gilani, Sayed Ali Naqi , Awrangjeb, Mohammad , Lu, Guojun
- Date: 2016
- Type: Text , Journal article
- Relation: Remote Sensing Vol. 8, no. 3 (2016), p. 1-27
- Full Text:
- Reviewed:
- Description: The development of robust and accurate methods for automatic building detection and regularisation using multisource data continues to be a challenge due to point cloud sparsity, high spectral variability, urban objects differences, surrounding complexity, and data misalignment. To address these challenges, constraints on object's size, height, area, and orientation are generally benefited which adversely affect the detection performance. Often the buildings either small in size, under shadows or partly occluded are ousted during elimination of superfluous objects. To overcome the limitations, a methodology is developed to extract and regularise the buildings using features from point cloud and orthoimagery. The building delineation process is carried out by identifying the candidate building regions and segmenting them into grids. Vegetation elimination, building detection and extraction of their partially occluded parts are achieved by synthesising the point cloud and image data. Finally, the detected buildings are regularised by exploiting the image lines in the building regularisation process. Detection and regularisation processes have been evaluated using the ISPRS benchmark and four Australian data sets which differ in point density (1 to 29 points/m2), building sizes, shadows, terrain, and vegetation. Results indicate that there is 83% to 93% per-area completeness with the correctness of above 95%, demonstrating the robustness of the approach. The absence of over- and many-to-many segmentation errors in the ISPRS data set indicate that the technique has higher per-object accuracy. While compared with six existing similar methods, the proposed detection and regularisation approach performs significantly better on more complex data sets (Australian) in contrast to the ISPRS benchmark, where it does better or equal to the counterparts. © 2016 by the authors.
- Authors: Gilani, Sayed Ali Naqi , Awrangjeb, Mohammad , Lu, Guojun
- Date: 2016
- Type: Text , Journal article
- Relation: Remote Sensing Vol. 8, no. 3 (2016), p. 1-27
- Full Text:
- Reviewed:
- Description: The development of robust and accurate methods for automatic building detection and regularisation using multisource data continues to be a challenge due to point cloud sparsity, high spectral variability, urban objects differences, surrounding complexity, and data misalignment. To address these challenges, constraints on object's size, height, area, and orientation are generally benefited which adversely affect the detection performance. Often the buildings either small in size, under shadows or partly occluded are ousted during elimination of superfluous objects. To overcome the limitations, a methodology is developed to extract and regularise the buildings using features from point cloud and orthoimagery. The building delineation process is carried out by identifying the candidate building regions and segmenting them into grids. Vegetation elimination, building detection and extraction of their partially occluded parts are achieved by synthesising the point cloud and image data. Finally, the detected buildings are regularised by exploiting the image lines in the building regularisation process. Detection and regularisation processes have been evaluated using the ISPRS benchmark and four Australian data sets which differ in point density (1 to 29 points/m2), building sizes, shadows, terrain, and vegetation. Results indicate that there is 83% to 93% per-area completeness with the correctness of above 95%, demonstrating the robustness of the approach. The absence of over- and many-to-many segmentation errors in the ISPRS data set indicate that the technique has higher per-object accuracy. While compared with six existing similar methods, the proposed detection and regularisation approach performs significantly better on more complex data sets (Australian) in contrast to the ISPRS benchmark, where it does better or equal to the counterparts. © 2016 by the authors.
Building change detection from LIDAR point cloud data based on connected component analysis
- Awrangjeb, Mohammad, Fraser, Clive, Lu, Guojun
- Authors: Awrangjeb, Mohammad , Fraser, Clive , Lu, Guojun
- Date: 2015
- Type: Text , Conference proceedings
- Relation: Isprs Geospatial Week 2015; La Grande Motte, France; 28th September-3rd October 2015; published in International Archives of the Photogrammetry Remote Sensing and Spatial Information Sciences Vol. II-3, p. 393-400
- Full Text:
- Reviewed:
- Description: Building data are one of the important data types in a topographic database. Building change detection after a period of time is necessary for many applications, such as identification of informal settlements. Based on the detected changes, the database has to be updated to ensure its usefulness. This paper proposes an improved building detection technique, which is a prerequisite for many building change detection techniques. The improved technique examines the gap between neighbouring buildings in the building mask in order to avoid under segmentation errors. Then, a new building change detection technique from LIDAR point cloud data is proposed. Buildings which are totally new or demolished are directly added to the change detection output. However, for demolished or extended building parts, a connected component analysis algorithm is applied and for each connected component its area, width and height are estimated in order to ascertain if it can be considered as a demolished or new building part. Finally, a graphical user interface (GUI) has been developed to update detected changes to the existing building map. Experimental results show that the improved building detection technique can offer not only higher performance in terms of completeness and correctness, but also a lower number of under-segmentation errors as compared to its original counterpart. The proposed change detection technique produces no omission errors and thus it can be exploited for enhanced automated building information updating within a topographic database. Using the developed GUI, the user can quickly examine each suggested change and indicate his/her decision with a minimum number of mouse clicks.
- Authors: Awrangjeb, Mohammad , Fraser, Clive , Lu, Guojun
- Date: 2015
- Type: Text , Conference proceedings
- Relation: Isprs Geospatial Week 2015; La Grande Motte, France; 28th September-3rd October 2015; published in International Archives of the Photogrammetry Remote Sensing and Spatial Information Sciences Vol. II-3, p. 393-400
- Full Text:
- Reviewed:
- Description: Building data are one of the important data types in a topographic database. Building change detection after a period of time is necessary for many applications, such as identification of informal settlements. Based on the detected changes, the database has to be updated to ensure its usefulness. This paper proposes an improved building detection technique, which is a prerequisite for many building change detection techniques. The improved technique examines the gap between neighbouring buildings in the building mask in order to avoid under segmentation errors. Then, a new building change detection technique from LIDAR point cloud data is proposed. Buildings which are totally new or demolished are directly added to the change detection output. However, for demolished or extended building parts, a connected component analysis algorithm is applied and for each connected component its area, width and height are estimated in order to ascertain if it can be considered as a demolished or new building part. Finally, a graphical user interface (GUI) has been developed to update detected changes to the existing building map. Experimental results show that the improved building detection technique can offer not only higher performance in terms of completeness and correctness, but also a lower number of under-segmentation errors as compared to its original counterpart. The proposed change detection technique produces no omission errors and thus it can be exploited for enhanced automated building information updating within a topographic database. Using the developed GUI, the user can quickly examine each suggested change and indicate his/her decision with a minimum number of mouse clicks.
Fusion of LiDAR data and multispectral imagery for effective building detection based on graph and connected component analysis
- Gilani, Alinaqi, Awrangjeb, Mohammad, Lu, Guojun
- Authors: Gilani, Alinaqi , Awrangjeb, Mohammad , Lu, Guojun
- Date: 2015
- Type: Text , Conference proceedings
- Full Text:
- Description: Building detection in complex scenes is a non-trivial exercise due to building shape variability, irregular terrain, shadows, and occlusion by highly dense vegetation. In this research, we present a graph based algorithm, which combines multispectral imagery and airborne LiDAR information to completely delineate the building boundaries in urban and densely vegetated area. In the first phase, LiDAR data is divided into two groups: ground and non-ground data, using ground height from a bare-earth DEM. A mask, known as the primary building mask, is generated from the non-ground LiDAR points where the black region represents the elevated area (buildings and trees), while the white region describes the ground (earth). The second phase begins with the process of Connected Component Analysis (CCA) where the number of objects present in the test scene are identified followed by initial boundary detection and labelling. Additionally, a graph from the connected components is generated, where each black pixel corresponds to a node. An edge of a unit distance is defined between a black pixel and a neighbouring black pixel, if any. An edge does not exist from a black pixel to a neighbouring white pixel, if any. This phenomenon produces a disconnected components graph, where each component represents a prospective building or a dense vegetation (a contiguous block of black pixels from the primary mask). In the third phase, a clustering process clusters the segmented lines, extracted from multispectral imagery, around the graph components, if possible. In the fourth step, NDVI, image entropy, and LiDAR data are utilised to discriminate between vegetation, buildings, and isolated building's occluded parts. Finally, the initially extracted building boundary is extended pixel-wise using NDVI, entropy, and LiDAR data to completely delineate the building and to maximise the boundary reach towards building edges. The proposed technique is evaluated using two Australian data sets: Aitkenvale and Hervey Bay, for object-based and pixel-based completeness, correctness, and quality. The proposed technique detects buildings larger than 50 m2 and 10 m2 in the Aitkenvale site with 100% and 91% accuracy, respectively, while in the Hervey Bay site it performs better with 100% accuracy for buildings larger than 10 m2 in area.
- Authors: Gilani, Alinaqi , Awrangjeb, Mohammad , Lu, Guojun
- Date: 2015
- Type: Text , Conference proceedings
- Full Text:
- Description: Building detection in complex scenes is a non-trivial exercise due to building shape variability, irregular terrain, shadows, and occlusion by highly dense vegetation. In this research, we present a graph based algorithm, which combines multispectral imagery and airborne LiDAR information to completely delineate the building boundaries in urban and densely vegetated area. In the first phase, LiDAR data is divided into two groups: ground and non-ground data, using ground height from a bare-earth DEM. A mask, known as the primary building mask, is generated from the non-ground LiDAR points where the black region represents the elevated area (buildings and trees), while the white region describes the ground (earth). The second phase begins with the process of Connected Component Analysis (CCA) where the number of objects present in the test scene are identified followed by initial boundary detection and labelling. Additionally, a graph from the connected components is generated, where each black pixel corresponds to a node. An edge of a unit distance is defined between a black pixel and a neighbouring black pixel, if any. An edge does not exist from a black pixel to a neighbouring white pixel, if any. This phenomenon produces a disconnected components graph, where each component represents a prospective building or a dense vegetation (a contiguous block of black pixels from the primary mask). In the third phase, a clustering process clusters the segmented lines, extracted from multispectral imagery, around the graph components, if possible. In the fourth step, NDVI, image entropy, and LiDAR data are utilised to discriminate between vegetation, buildings, and isolated building's occluded parts. Finally, the initially extracted building boundary is extended pixel-wise using NDVI, entropy, and LiDAR data to completely delineate the building and to maximise the boundary reach towards building edges. The proposed technique is evaluated using two Australian data sets: Aitkenvale and Hervey Bay, for object-based and pixel-based completeness, correctness, and quality. The proposed technique detects buildings larger than 50 m2 and 10 m2 in the Aitkenvale site with 100% and 91% accuracy, respectively, while in the Hervey Bay site it performs better with 100% accuracy for buildings larger than 10 m2 in area.
Multimodal image registration technique based on improved local feature descriptors
- Teng, Shyh, Hossain, Tanvir, Lu, Guojun
- Authors: Teng, Shyh , Hossain, Tanvir , Lu, Guojun
- Date: 2015
- Type: Text , Journal article
- Relation: Journal of Electronic Imaging Vol. 24, no. 1 (2015), p.
- Full Text:
- Reviewed:
- Description: Multimodal image registration has received significant research attention over the past decade, and the majority of the techniques are global in nature. Although local techniques are widely used for general image registration, there are only limited studies on them for multimodal image registration. Scale invariant feature transform (SIFT) is a well-known general image registration technique. However, SIFT descriptors are not invariant to multimodality. We propose a SIFT-based technique that is modality invariant and still retains the strengths of local techniques. Moreover, our proposed histogram weighting strategies also improve the accuracy of descriptor matching, which is an important image registration step. As a result, our proposed strategies can not only improve the multimodal registration accuracy but also have the potential to improve the performance of all SIFT-based applications, e.g., general image registration and object recognition.
- Authors: Teng, Shyh , Hossain, Tanvir , Lu, Guojun
- Date: 2015
- Type: Text , Journal article
- Relation: Journal of Electronic Imaging Vol. 24, no. 1 (2015), p.
- Full Text:
- Reviewed:
- Description: Multimodal image registration has received significant research attention over the past decade, and the majority of the techniques are global in nature. Although local techniques are widely used for general image registration, there are only limited studies on them for multimodal image registration. Scale invariant feature transform (SIFT) is a well-known general image registration technique. However, SIFT descriptors are not invariant to multimodality. We propose a SIFT-based technique that is modality invariant and still retains the strengths of local techniques. Moreover, our proposed histogram weighting strategies also improve the accuracy of descriptor matching, which is an important image registration step. As a result, our proposed strategies can not only improve the multimodal registration accuracy but also have the potential to improve the performance of all SIFT-based applications, e.g., general image registration and object recognition.