Simplifying and improving ant-based clustering
- Tan, Swee, Ting, Kaiming, Teng, Shyh
- Authors: Tan, Swee , Ting, Kaiming , Teng, Shyh
- Date: 2011
- Type: Text , Conference paper
- Relation: 11th International Conference on Computational Science, ICCS 2011; Singapore, Singapore; 1st-3rd June 2011, published in Procedia Computer Science Vol. 4, p. 46-55
- Full Text:
- Reviewed:
- Description: Ant-based clustering (ABC) is a data clustering approach inspired from cemetery formation activities observed in real ant colonies. Building upon the premise of collective intelligence, such an approach uses multiple ant-like agents and a mixture of heuristics, in order to create systems that are capable of clustering real-world data. Many recently proposed ABC systems have shown competitive results, but these systems are geared towards adding new heuristics, resulting in increasingly complex systems that are harder to understand and improve. In contrast to this direction, we demonstrate that a state-of-the-art ABC system can be systematically evaluated and then simplified. The streamlined model, which we call SABC, differs fundamentally from traditional ABC systems as it does not use the ant-colony and several key components. Yet, our empirical study shows that SABC performs more effectively and effciently than the state-of-the-art ABC system.
- Authors: Tan, Swee , Ting, Kaiming , Teng, Shyh
- Date: 2011
- Type: Text , Conference paper
- Relation: 11th International Conference on Computational Science, ICCS 2011; Singapore, Singapore; 1st-3rd June 2011, published in Procedia Computer Science Vol. 4, p. 46-55
- Full Text:
- Reviewed:
- Description: Ant-based clustering (ABC) is a data clustering approach inspired from cemetery formation activities observed in real ant colonies. Building upon the premise of collective intelligence, such an approach uses multiple ant-like agents and a mixture of heuristics, in order to create systems that are capable of clustering real-world data. Many recently proposed ABC systems have shown competitive results, but these systems are geared towards adding new heuristics, resulting in increasingly complex systems that are harder to understand and improve. In contrast to this direction, we demonstrate that a state-of-the-art ABC system can be systematically evaluated and then simplified. The streamlined model, which we call SABC, differs fundamentally from traditional ABC systems as it does not use the ant-colony and several key components. Yet, our empirical study shows that SABC performs more effectively and effciently than the state-of-the-art ABC system.
An improved building detection in complex sites using the LIDAR height variation and point density
- Siddiqui, Fasahat, Teng, Shyh, Lu, Guojun, Awrangjeb, Mohammad
- Authors: Siddiqui, Fasahat , Teng, Shyh , Lu, Guojun , Awrangjeb, Mohammad
- Date: 2013
- Type: Text , Conference proceedings
- Relation: 2013 28th International Conference on Image and Vision Computing New Zealand, IVCNZ 2013; Wellington; New Zealand; 27th-29th November 2013; published in International Conference Image and Vision Computing New Zealand p. 471-476
- Full Text:
- Reviewed:
- Description: In this paper, the height variation in LIDAR (Light Detection And Ranging) point cloud data and point density are analyzed to remove the false building detection in highly vegetation and hilly sites. In general, the LIDAR points in a tree area have higher height variations than those in a building area. Moreover, the density of points having similar height values is lower in a tree area than in a building area. The proposed method uses such information as an improvement to a current state-of-the-art building detection method. The qualitative and object-based quantitative analyzes have been performed to verify the effectiveness of the proposed building detection method as compared with a current method. The analysis shows that proposed building detection method successfully reduces false building detection (i.e. trees in high complex sites of Australia and Germany), and the average correctness and quality have been improved by 6.36% and 6.16% respectively.
- Authors: Siddiqui, Fasahat , Teng, Shyh , Lu, Guojun , Awrangjeb, Mohammad
- Date: 2013
- Type: Text , Conference proceedings
- Relation: 2013 28th International Conference on Image and Vision Computing New Zealand, IVCNZ 2013; Wellington; New Zealand; 27th-29th November 2013; published in International Conference Image and Vision Computing New Zealand p. 471-476
- Full Text:
- Reviewed:
- Description: In this paper, the height variation in LIDAR (Light Detection And Ranging) point cloud data and point density are analyzed to remove the false building detection in highly vegetation and hilly sites. In general, the LIDAR points in a tree area have higher height variations than those in a building area. Moreover, the density of points having similar height values is lower in a tree area than in a building area. The proposed method uses such information as an improvement to a current state-of-the-art building detection method. The qualitative and object-based quantitative analyzes have been performed to verify the effectiveness of the proposed building detection method as compared with a current method. The analysis shows that proposed building detection method successfully reduces false building detection (i.e. trees in high complex sites of Australia and Germany), and the average correctness and quality have been improved by 6.36% and 6.16% respectively.
Automatic Extraction of Buildings in an Urban Region
- Siddiqui, Fasahat, Teng, Shyh, Lu, Guojun, Awrangjeb, Mohammad
- Authors: Siddiqui, Fasahat , Teng, Shyh , Lu, Guojun , Awrangjeb, Mohammad
- Date: 2014
- Type: Text , Conference proceedings
- Relation: 29th International Conference on Image and Vision Computing New Zealand, IVCNZ 2014; Hamilton; New Zealand; 19th-21st November 2014; published in ACM International Conference Proceeding Series p. 178-183
- Full Text:
- Reviewed:
- Description: There are currently several automatic building extraction methods introduced in the literature, but none of them are capable to completely extract portions of a building that are below a pre-defined building minimum height threshold. This paper proposes a systematic method which analyzes the height differences between the extracted adjacent planes above and below the height threshold as well as the planes' connectivity, thereby, extracting all portions belonging to buildings more completely. In general, the height difference between the edges of the adjacent planes above and below the height threshold that belong to the same building is more uniform. In addition, the extracted planes below the height threshold that belong to a building and their adjacent ground planes also have a clear height difference. The proposed method incorporates such information to achieve better performance in building extraction. We have compared our proposed method to a current state-of-the-art building extraction method qualitatively and quantitatively. Our experimental results show that our proposed method successfully recovers portions of a building below the height threshold, thereby achieving relatively higher average completeness (an improvement of 1.14%) and quality (an improvement of 0.93%).
- Authors: Siddiqui, Fasahat , Teng, Shyh , Lu, Guojun , Awrangjeb, Mohammad
- Date: 2014
- Type: Text , Conference proceedings
- Relation: 29th International Conference on Image and Vision Computing New Zealand, IVCNZ 2014; Hamilton; New Zealand; 19th-21st November 2014; published in ACM International Conference Proceeding Series p. 178-183
- Full Text:
- Reviewed:
- Description: There are currently several automatic building extraction methods introduced in the literature, but none of them are capable to completely extract portions of a building that are below a pre-defined building minimum height threshold. This paper proposes a systematic method which analyzes the height differences between the extracted adjacent planes above and below the height threshold as well as the planes' connectivity, thereby, extracting all portions belonging to buildings more completely. In general, the height difference between the edges of the adjacent planes above and below the height threshold that belong to the same building is more uniform. In addition, the extracted planes below the height threshold that belong to a building and their adjacent ground planes also have a clear height difference. The proposed method incorporates such information to achieve better performance in building extraction. We have compared our proposed method to a current state-of-the-art building extraction method qualitatively and quantitatively. Our experimental results show that our proposed method successfully recovers portions of a building below the height threshold, thereby achieving relatively higher average completeness (an improvement of 1.14%) and quality (an improvement of 0.93%).
Multimodal image registration technique based on improved local feature descriptors
- Teng, Shyh, Hossain, Tanvir, Lu, Guojun
- Authors: Teng, Shyh , Hossain, Tanvir , Lu, Guojun
- Date: 2015
- Type: Text , Journal article
- Relation: Journal of Electronic Imaging Vol. 24, no. 1 (2015), p.
- Full Text:
- Reviewed:
- Description: Multimodal image registration has received significant research attention over the past decade, and the majority of the techniques are global in nature. Although local techniques are widely used for general image registration, there are only limited studies on them for multimodal image registration. Scale invariant feature transform (SIFT) is a well-known general image registration technique. However, SIFT descriptors are not invariant to multimodality. We propose a SIFT-based technique that is modality invariant and still retains the strengths of local techniques. Moreover, our proposed histogram weighting strategies also improve the accuracy of descriptor matching, which is an important image registration step. As a result, our proposed strategies can not only improve the multimodal registration accuracy but also have the potential to improve the performance of all SIFT-based applications, e.g., general image registration and object recognition.
- Authors: Teng, Shyh , Hossain, Tanvir , Lu, Guojun
- Date: 2015
- Type: Text , Journal article
- Relation: Journal of Electronic Imaging Vol. 24, no. 1 (2015), p.
- Full Text:
- Reviewed:
- Description: Multimodal image registration has received significant research attention over the past decade, and the majority of the techniques are global in nature. Although local techniques are widely used for general image registration, there are only limited studies on them for multimodal image registration. Scale invariant feature transform (SIFT) is a well-known general image registration technique. However, SIFT descriptors are not invariant to multimodality. We propose a SIFT-based technique that is modality invariant and still retains the strengths of local techniques. Moreover, our proposed histogram weighting strategies also improve the accuracy of descriptor matching, which is an important image registration step. As a result, our proposed strategies can not only improve the multimodal registration accuracy but also have the potential to improve the performance of all SIFT-based applications, e.g., general image registration and object recognition.
A new building mask using the gradient of heights for automatic building extraction
- Siddiqui, Fasahat, Awrangjeb, Mohammad, Teng, Shyh, Lu, Guojun
- Authors: Siddiqui, Fasahat , Awrangjeb, Mohammad , Teng, Shyh , Lu, Guojun
- Date: 2016
- Type: Text , Conference proceedings
- Relation: 2016 International Conference on Digital Image Computing: Techniques and Applications (Dicta); Gold Coast, Australia; 30th November-2nd December 2016 p. 288-294
- Full Text:
- Reviewed:
- Description: A number of building detection methods have been proposed in the literature. However, they are not effective in detecting small buildings (typically, 50 m(2)) and buildings with transparent roof due to the way area thresholds and ground points are used. This paper proposes a new building mask to overcome these limitations and enables detection of buildings not only with transparent roof materials but also which are small in size. The proposed building detection method transforms the non-ground height information into an intensity image and then analyses the gradient information in the image. It uses a small area threshold of 1 m2 and, thereby, is able to detect small buildings such as garden sheds. The use of non-ground points allows analyses of the gradient on all types of roof materials and, thus, the method is also able to detect buildings with transparent roofs. Our experimental results show that the proposed method can successfully extract buildings even when their roofs are small and/or transparent, thereby, achieving relatively higher average completeness and quality.
- Authors: Siddiqui, Fasahat , Awrangjeb, Mohammad , Teng, Shyh , Lu, Guojun
- Date: 2016
- Type: Text , Conference proceedings
- Relation: 2016 International Conference on Digital Image Computing: Techniques and Applications (Dicta); Gold Coast, Australia; 30th November-2nd December 2016 p. 288-294
- Full Text:
- Reviewed:
- Description: A number of building detection methods have been proposed in the literature. However, they are not effective in detecting small buildings (typically, 50 m(2)) and buildings with transparent roof due to the way area thresholds and ground points are used. This paper proposes a new building mask to overcome these limitations and enables detection of buildings not only with transparent roof materials but also which are small in size. The proposed building detection method transforms the non-ground height information into an intensity image and then analyses the gradient information in the image. It uses a small area threshold of 1 m2 and, thereby, is able to detect small buildings such as garden sheds. The use of non-ground points allows analyses of the gradient on all types of roof materials and, thus, the method is also able to detect buildings with transparent roofs. Our experimental results show that the proposed method can successfully extract buildings even when their roofs are small and/or transparent, thereby, achieving relatively higher average completeness and quality.
A robust gradient based method for building extraction from LiDAR and photogrammetric imagery
- Siddiqui, Fasahat, Teng, Shyh, Awrangjeb, Mohammad, Lu, Guojun
- Authors: Siddiqui, Fasahat , Teng, Shyh , Awrangjeb, Mohammad , Lu, Guojun
- Date: 2016
- Type: Text , Journal article
- Relation: Sensors (Switzerland) Vol. 16, no. 7 (2016), p. 1-24
- Full Text:
- Reviewed:
- Description: Existing automatic building extraction methods are not effective in extracting buildings which are small in size and have transparent roofs. The application of large area threshold prohibits detection of small buildings and the use of ground points in generating the building mask prevents detection of transparent buildings. In addition, the existingmethods use numerous parameters to extract buildings in complex environments, e.g.,hilly area and high vegetation. However, the empirical tuning of large number of parameters reduces the robustness of building extraction methods. This paper proposes a novel Gradient-based Building Extraction (GBE) method to address these limitations. The proposed method transforms the Light Detection And Ranging (LiDAR) height information into intensity image without interpolation of point heights and then analyses the gradient information in the image. Generally, building roof planes have a constant height change along the slope of a roof plane whereas trees have a random height change. With such an analysis, buildings of a greater range of sizes with a transparent or opaque roof can be extracted. In addition, a local colour matching approach is introduced as a post-processing stage to eliminate trees. This stage of our proposed method does not require any manual setting and all parameters are set automatically from the data. The other post processing stages including variance, point density and shadow elimination are also applied to verify the extracted buildings, where comparatively fewer empirically set parameters are used. The performance of the proposed GBE method is evaluated on two benchmark data sets by using the object and pixel based metrics (completeness, correctness and quality). Our experimental results show the effectiveness of the proposed method in eliminating trees, extracting buildings of all sizes, and extracting buildings with and without transparent roof. When compared with current state-of-the-art building extraction methods, the proposed method outperforms the existing methods in various evaluation metrics. © 2016 by the authors; licensee MDPI, Basel, Switzerland.
- Authors: Siddiqui, Fasahat , Teng, Shyh , Awrangjeb, Mohammad , Lu, Guojun
- Date: 2016
- Type: Text , Journal article
- Relation: Sensors (Switzerland) Vol. 16, no. 7 (2016), p. 1-24
- Full Text:
- Reviewed:
- Description: Existing automatic building extraction methods are not effective in extracting buildings which are small in size and have transparent roofs. The application of large area threshold prohibits detection of small buildings and the use of ground points in generating the building mask prevents detection of transparent buildings. In addition, the existingmethods use numerous parameters to extract buildings in complex environments, e.g.,hilly area and high vegetation. However, the empirical tuning of large number of parameters reduces the robustness of building extraction methods. This paper proposes a novel Gradient-based Building Extraction (GBE) method to address these limitations. The proposed method transforms the Light Detection And Ranging (LiDAR) height information into intensity image without interpolation of point heights and then analyses the gradient information in the image. Generally, building roof planes have a constant height change along the slope of a roof plane whereas trees have a random height change. With such an analysis, buildings of a greater range of sizes with a transparent or opaque roof can be extracted. In addition, a local colour matching approach is introduced as a post-processing stage to eliminate trees. This stage of our proposed method does not require any manual setting and all parameters are set automatically from the data. The other post processing stages including variance, point density and shadow elimination are also applied to verify the extracted buildings, where comparatively fewer empirically set parameters are used. The performance of the proposed GBE method is evaluated on two benchmark data sets by using the object and pixel based metrics (completeness, correctness and quality). Our experimental results show the effectiveness of the proposed method in eliminating trees, extracting buildings of all sizes, and extracting buildings with and without transparent roof. When compared with current state-of-the-art building extraction methods, the proposed method outperforms the existing methods in various evaluation metrics. © 2016 by the authors; licensee MDPI, Basel, Switzerland.
A Hybrid data dependent dissimilarity measure for image retrieval
- Shojanazeri, Hamid, Teng, Shyh, Lu, Guojun
- Authors: Shojanazeri, Hamid , Teng, Shyh , Lu, Guojun
- Date: 2017
- Type: Text , Unpublished work
- Full Text:
- Description: Abstract— In image retrieval, an effective dissimilarity measure is required to retrieve the perceptually similar images. Minkowski-type (lp ) distance is widely used for image retrieval, however it has its limitations. It focuses on distance between image features and ignores the data distribution of the image features, which can play an important role in measuring perceptual similarity of images. !! also favours the most dominant components in calculating the total dissimilarity. A data dependent measure, named !! -dissimilarity, which estimates the dissimilarity using the data distribution, has been proposed recently. Rather than relying on geometric distance, it measures the dissimilarity between two instances in each dimension as a probability mass in a region that encloses the two instances. It considers two instances in a sparse region to be more similar than in a dense region. Using the probability of data mass enables all the dimensions of feature vectors to contribute in the final estimate of dissimilarity, so it does not just heavily bias towards the most dominant components. However, relying only on data distribution and completely ignoring the geometric distance raise another limitation. This can result in finding two instances similar only due to being in a sparse region, however if the geometric distance between them is large then they are not perceptually similar. To address this limitation we proposed a new hybrid data dependent dissimilarity (HDDD) measure that considers both data distribution as well as geometric distance. Our experimental results using Corel database and Caltech 101 show that (HDDD) leads to higher image retrieval performance than lp distance (lpD) and mp.
- Authors: Shojanazeri, Hamid , Teng, Shyh , Lu, Guojun
- Date: 2017
- Type: Text , Unpublished work
- Full Text:
- Description: Abstract— In image retrieval, an effective dissimilarity measure is required to retrieve the perceptually similar images. Minkowski-type (lp ) distance is widely used for image retrieval, however it has its limitations. It focuses on distance between image features and ignores the data distribution of the image features, which can play an important role in measuring perceptual similarity of images. !! also favours the most dominant components in calculating the total dissimilarity. A data dependent measure, named !! -dissimilarity, which estimates the dissimilarity using the data distribution, has been proposed recently. Rather than relying on geometric distance, it measures the dissimilarity between two instances in each dimension as a probability mass in a region that encloses the two instances. It considers two instances in a sparse region to be more similar than in a dense region. Using the probability of data mass enables all the dimensions of feature vectors to contribute in the final estimate of dissimilarity, so it does not just heavily bias towards the most dominant components. However, relying only on data distribution and completely ignoring the geometric distance raise another limitation. This can result in finding two instances similar only due to being in a sparse region, however if the geometric distance between them is large then they are not perceptually similar. To address this limitation we proposed a new hybrid data dependent dissimilarity (HDDD) measure that considers both data distribution as well as geometric distance. Our experimental results using Corel database and Caltech 101 show that (HDDD) leads to higher image retrieval performance than lp distance (lpD) and mp.
A new image dissimilarity measure incorporating human perception
- Shojanazeri, Hamid, Teng, Shyh, Aryal, Sunil, Zhang, Dengsheng, Lu, Guojun
- Authors: Shojanazeri, Hamid , Teng, Shyh , Aryal, Sunil , Zhang, Dengsheng , Lu, Guojun
- Date: 2018
- Type: Text , Unpublished work
- Full Text:
- Description: Pairwise (dis) similarity measure of data objects is central to many applications of image anlaytics, such as image retrieval and classification. Geometric distance, particularly Euclidean distance ((
- Authors: Shojanazeri, Hamid , Teng, Shyh , Aryal, Sunil , Zhang, Dengsheng , Lu, Guojun
- Date: 2018
- Type: Text , Unpublished work
- Full Text:
- Description: Pairwise (dis) similarity measure of data objects is central to many applications of image anlaytics, such as image retrieval and classification. Geometric distance, particularly Euclidean distance ((
Cuboid colour image segmentation using intuitive distance measure
- Tania, Sheikh, Murshed, Manzur, Teng, Shyh, Karmakar, Gour
- Authors: Tania, Sheikh , Murshed, Manzur , Teng, Shyh , Karmakar, Gour
- Date: 2018
- Type: Text , Conference proceedings
- Relation: 2018 International Conference on Image and Vision Computing New Zealand, IVCNZ 2018; Auckland, New Zealand; 19th-21st November 2018 Vol. 2018-November, p. 1-6
- Full Text:
- Reviewed:
- Description: In this paper, an improved algorithm for cuboid image segmentation is proposed. To address the two main limitations of the recently proposed cuboid segmentation algorithm, the improved algorithm substitutes colour quantization in HCL colour space with infinity norm distance in RGB colour space along with a different way to impose area thresholding. We also propose a new metric to evaluate the quality of segmentation. Experimental results show that the proposed cuboid segmentation algorithm significantly outperforms the existing cuboid segmentation algorithm in terms of quality of segmentation.
- Description: International Conference Image and Vision Computing New Zealand
- Authors: Tania, Sheikh , Murshed, Manzur , Teng, Shyh , Karmakar, Gour
- Date: 2018
- Type: Text , Conference proceedings
- Relation: 2018 International Conference on Image and Vision Computing New Zealand, IVCNZ 2018; Auckland, New Zealand; 19th-21st November 2018 Vol. 2018-November, p. 1-6
- Full Text:
- Reviewed:
- Description: In this paper, an improved algorithm for cuboid image segmentation is proposed. To address the two main limitations of the recently proposed cuboid segmentation algorithm, the improved algorithm substitutes colour quantization in HCL colour space with infinity norm distance in RGB colour space along with a different way to impose area thresholding. We also propose a new metric to evaluate the quality of segmentation. Experimental results show that the proposed cuboid segmentation algorithm significantly outperforms the existing cuboid segmentation algorithm in terms of quality of segmentation.
- Description: International Conference Image and Vision Computing New Zealand
Enhanced colour image retrieval with cuboid segmentation
- Murshed, Manzur, Karmakar, Priyabrata, Teng, Shyh, Lu, Guojun
- Authors: Murshed, Manzur , Karmakar, Priyabrata , Teng, Shyh , Lu, Guojun
- Date: 2018
- Type: Text , Conference proceedings , Conference paper
- Relation: 2018 International Conference on Digital Image Computing: Techniques and Applications, DICTA 2018; Canberra, Australia; 10th-13th December 2018
- Full Text:
- Reviewed:
- Description: In this paper, we further investigate our recently proposed cuboid image segmentation algorithm for effective image retrieval. Instead of using all cuboids (i.e. segments), we have proposed two approaches to choose different subsets of cuboids appropriately. With the experimental results on eBay dataset, we have shown that our proposals outperform retrieval performance of the existing technique. In addition, we have investigated how many segments are required for the most effective image retrieval and provide a quick method to determine the suitable number of cuboids.
- Description: 2018 International Conference on Digital Image Computing: Techniques and Applications, DICTA 2018
- Authors: Murshed, Manzur , Karmakar, Priyabrata , Teng, Shyh , Lu, Guojun
- Date: 2018
- Type: Text , Conference proceedings , Conference paper
- Relation: 2018 International Conference on Digital Image Computing: Techniques and Applications, DICTA 2018; Canberra, Australia; 10th-13th December 2018
- Full Text:
- Reviewed:
- Description: In this paper, we further investigate our recently proposed cuboid image segmentation algorithm for effective image retrieval. Instead of using all cuboids (i.e. segments), we have proposed two approaches to choose different subsets of cuboids appropriately. With the experimental results on eBay dataset, we have shown that our proposals outperform retrieval performance of the existing technique. In addition, we have investigated how many segments are required for the most effective image retrieval and provide a quick method to determine the suitable number of cuboids.
- Description: 2018 International Conference on Digital Image Computing: Techniques and Applications, DICTA 2018
Hierarchical colour image segmentation by leveraging RGB channels independently
- Tania, Sheikh, Murshed, Manzur, Teng, Shyh, Karmakar, Gour
- Authors: Tania, Sheikh , Murshed, Manzur , Teng, Shyh , Karmakar, Gour
- Date: 2019
- Type: Text , Conference paper
- Relation: 9th Pacific-Rim Symposium on Image and Video Technology, PSIVT 2019 Vol. 11854 LNCS, p. 197-210
- Full Text:
- Reviewed:
- Description: In this paper, we introduce a hierarchical colour image segmentation based on cuboid partitioning using simple statistical features of the pixel intensities in the RGB channels. Estimating the difference between any two colours is a challenging task. As most of the colour models are not perceptually uniform, investigation of an alternative strategy is highly demanding. To address this issue, for our proposed technique, we present a new concept for colour distance measure based on the inconsistency of pixel intensities of an image which is more compliant to human perception. Constructing a reliable set of superpixels from an image is fundamental for further merging. As cuboid partitioning is a superior candidate to produce superpixels, we use the agglomerative merging to yield the final segmentation results exploiting the outcome of our proposed cuboid partitioning. The proposed cuboid segmentation based algorithm significantly outperforms not only the quadtree-based segmentation but also existing state-of-the-art segmentation algorithms in terms of quality of segmentation for the benchmark datasets used in image segmentation. © 2019, Springer Nature Switzerland AG.
- Authors: Tania, Sheikh , Murshed, Manzur , Teng, Shyh , Karmakar, Gour
- Date: 2019
- Type: Text , Conference paper
- Relation: 9th Pacific-Rim Symposium on Image and Video Technology, PSIVT 2019 Vol. 11854 LNCS, p. 197-210
- Full Text:
- Reviewed:
- Description: In this paper, we introduce a hierarchical colour image segmentation based on cuboid partitioning using simple statistical features of the pixel intensities in the RGB channels. Estimating the difference between any two colours is a challenging task. As most of the colour models are not perceptually uniform, investigation of an alternative strategy is highly demanding. To address this issue, for our proposed technique, we present a new concept for colour distance measure based on the inconsistency of pixel intensities of an image which is more compliant to human perception. Constructing a reliable set of superpixels from an image is fundamental for further merging. As cuboid partitioning is a superior candidate to produce superpixels, we use the agglomerative merging to yield the final segmentation results exploiting the outcome of our proposed cuboid partitioning. The proposed cuboid segmentation based algorithm significantly outperforms not only the quadtree-based segmentation but also existing state-of-the-art segmentation algorithms in terms of quality of segmentation for the benchmark datasets used in image segmentation. © 2019, Springer Nature Switzerland AG.
Improved image analysis methodology for detecting changes in evidence positioning at crime scenes
- Petty, Mark, Teng, Shyh, Murshed, Manzur
- Authors: Petty, Mark , Teng, Shyh , Murshed, Manzur
- Date: 2019
- Type: Text , Conference proceedings , Conference paper
- Relation: 2019 International Conference on Digital Image Computing: Techniques and Applications, DICTA 2019
- Full Text:
- Reviewed:
- Description: This paper proposed an improved methodology to assist forensic investigators in detecting positional change of objects due to crime scene contamination. Either intentionally or by accident, crime scene contamination can occur during the investigation and documentation process. This new proposed methodology utilises an ASIFT-based feature detection algorithm that compares pre- and post-contaminated images of the same scene, taken from different viewpoints. The contention is that the ASIFT registration technique is better suited to real world crime scene photography, being more robust to affine distortion that occurs when capturing images from different viewpoints. The proposed methodology was tested with both the SIFT and ASIFT registration techniques to show that (1) it could identify missing, planted and displaced objects using both SIFT and ASIFT and (2) ASIFT is superior to SIFT in terms of error in displacement estimation, especially for larger viewpoint discrepancies between the pre- and post-contamination images. This supports the contention that our proposed methodology in combination with ASIFT is better suited to handle real world crime scene photography. © 2019 IEEE.
- Description: E1
- Authors: Petty, Mark , Teng, Shyh , Murshed, Manzur
- Date: 2019
- Type: Text , Conference proceedings , Conference paper
- Relation: 2019 International Conference on Digital Image Computing: Techniques and Applications, DICTA 2019
- Full Text:
- Reviewed:
- Description: This paper proposed an improved methodology to assist forensic investigators in detecting positional change of objects due to crime scene contamination. Either intentionally or by accident, crime scene contamination can occur during the investigation and documentation process. This new proposed methodology utilises an ASIFT-based feature detection algorithm that compares pre- and post-contaminated images of the same scene, taken from different viewpoints. The contention is that the ASIFT registration technique is better suited to real world crime scene photography, being more robust to affine distortion that occurs when capturing images from different viewpoints. The proposed methodology was tested with both the SIFT and ASIFT registration techniques to show that (1) it could identify missing, planted and displaced objects using both SIFT and ASIFT and (2) ASIFT is superior to SIFT in terms of error in displacement estimation, especially for larger viewpoint discrepancies between the pre- and post-contamination images. This supports the contention that our proposed methodology in combination with ASIFT is better suited to handle real world crime scene photography. © 2019 IEEE.
- Description: E1
An enhancement to the spatial pyramid matching for image classification and retrieval
- Karmakar, Priyabrata, Teng, Shyh, Lu, Guojun, Zhang, Dengsheng
- Authors: Karmakar, Priyabrata , Teng, Shyh , Lu, Guojun , Zhang, Dengsheng
- Date: 2020
- Type: Text , Journal article
- Relation: IEEE Access Vol. 8, no. (2020), p. 22463-22472
- Full Text:
- Reviewed:
- Description: Spatial pyramid matching (SPM) is one of the widely used methods to incorporate spatial information into the image representation. Despite its effectiveness, the traditional SPM is not rotation invariant. A rotation invariant SPM has been proposed in the literature but it has many limitations regarding the effectiveness. In this paper, we investigate how to make SPM robust to rotation by addressing those limitations. In an SPM framework, an image is divided into an increasing number of partitions at different pyramid levels. In this paper, our main focus is on how to partition images in such a way that the resulting structure can deal with image-level rotations. To do that, we investigate three concentric ring partitioning schemes. Apart from image partitioning, another important component of the SPM framework is a weight function. To apportion the contribution of each pyramid level to the final matching between two images, the weight function is needed. In this paper, we propose a new weight function which is suitable for the rotation-invariant SPM structure. Experiments based on image classification and retrieval are performed on five image databases. The detailed result analysis shows that we are successful in enhancing the effectiveness of SPM for image classification and retrieval. © 2013 IEEE.
- Authors: Karmakar, Priyabrata , Teng, Shyh , Lu, Guojun , Zhang, Dengsheng
- Date: 2020
- Type: Text , Journal article
- Relation: IEEE Access Vol. 8, no. (2020), p. 22463-22472
- Full Text:
- Reviewed:
- Description: Spatial pyramid matching (SPM) is one of the widely used methods to incorporate spatial information into the image representation. Despite its effectiveness, the traditional SPM is not rotation invariant. A rotation invariant SPM has been proposed in the literature but it has many limitations regarding the effectiveness. In this paper, we investigate how to make SPM robust to rotation by addressing those limitations. In an SPM framework, an image is divided into an increasing number of partitions at different pyramid levels. In this paper, our main focus is on how to partition images in such a way that the resulting structure can deal with image-level rotations. To do that, we investigate three concentric ring partitioning schemes. Apart from image partitioning, another important component of the SPM framework is a weight function. To apportion the contribution of each pyramid level to the final matching between two images, the weight function is needed. In this paper, we propose a new weight function which is suitable for the rotation-invariant SPM structure. Experiments based on image classification and retrieval are performed on five image databases. The detailed result analysis shows that we are successful in enhancing the effectiveness of SPM for image classification and retrieval. © 2013 IEEE.
Learning large margin multiple granularity features with an improved siamese network for person re-identification
- Li, Da-Xiang, Fei, Gy, Teng, Shyh
- Authors: Li, Da-Xiang , Fei, Gy , Teng, Shyh
- Date: 2020
- Type: Text , Journal article
- Relation: Symmetry-Basel Vol. 12, no. 1 (Jan 2020), p. 16
- Full Text:
- Reviewed:
- Description: Person re-identification (Re-ID) is a non-overlapping multi-camera retrieval task to match different images of the same person, and it has become a hot research topic in many fields, such as surveillance security, criminal investigation, and video analysis. As one kind of important architecture for person re-identification, Siamese networks usually adopt standard softmax loss function, and they can only obtain the global features of person images, ignoring the local features and the large margin for classification. In this paper, we design a novel symmetric Siamese network model named Siamese Multiple Granularity Network (SMGN), which can jointly learn the large margin multiple granularity features and similarity metrics for person re-identification. Firstly, two branches for global and local feature extraction are designed in the backbone of the proposed SMGN model, and the extracted features are concatenated together as multiple granularity features of person images. Then, to enhance their discriminating ability, the multiple channel weighted fusion (MCWF) loss function is constructed for the SMGN model, which includes the verification loss and identification loss of the training image pair. Extensive comparative experiments on four benchmark datasets (CUHK01, CUHK03, Market-1501 and DukeMTMC-reID) show the effectiveness of our proposed method and its performance outperforms many state-of-the-art methods.
- Authors: Li, Da-Xiang , Fei, Gy , Teng, Shyh
- Date: 2020
- Type: Text , Journal article
- Relation: Symmetry-Basel Vol. 12, no. 1 (Jan 2020), p. 16
- Full Text:
- Reviewed:
- Description: Person re-identification (Re-ID) is a non-overlapping multi-camera retrieval task to match different images of the same person, and it has become a hot research topic in many fields, such as surveillance security, criminal investigation, and video analysis. As one kind of important architecture for person re-identification, Siamese networks usually adopt standard softmax loss function, and they can only obtain the global features of person images, ignoring the local features and the large margin for classification. In this paper, we design a novel symmetric Siamese network model named Siamese Multiple Granularity Network (SMGN), which can jointly learn the large margin multiple granularity features and similarity metrics for person re-identification. Firstly, two branches for global and local feature extraction are designed in the backbone of the proposed SMGN model, and the extracted features are concatenated together as multiple granularity features of person images. Then, to enhance their discriminating ability, the multiple channel weighted fusion (MCWF) loss function is constructed for the SMGN model, which includes the verification loss and identification loss of the training image pair. Extensive comparative experiments on four benchmark datasets (CUHK01, CUHK03, Market-1501 and DukeMTMC-reID) show the effectiveness of our proposed method and its performance outperforms many state-of-the-art methods.
Network representation learning: From traditional feature learning to deep learning
- Sun, Ke, Wang, Lei, Xu, Bo, Zhao, Wenhong, Teng, Shyh, Xia, Feng
- Authors: Sun, Ke , Wang, Lei , Xu, Bo , Zhao, Wenhong , Teng, Shyh , Xia, Feng
- Date: 2020
- Type: Text , Journal article
- Relation: IEEE Access Vol. 8, no. (2020), p. 205600-205617
- Full Text:
- Reviewed:
- Description: Network representation learning (NRL) is an effective graph analytics technique and promotes users to deeply understand the hidden characteristics of graph data. It has been successfully applied in many real-world tasks related to network science, such as social network data processing, biological information processing, and recommender systems. Deep Learning is a powerful tool to learn data features. However, it is non-trivial to generalize deep learning to graph-structured data since it is different from the regular data such as pictures having spatial information and sounds having temporal information. Recently, researchers proposed many deep learning-based methods in the area of NRL. In this survey, we investigate classical NRL from traditional feature learning method to the deep learning-based model, analyze relationships between them, and summarize the latest progress. Finally, we discuss open issues considering NRL and point out the future directions in this field. © 2020 Institute of Electrical and Electronics Engineers Inc.. All rights reserved.
- Authors: Sun, Ke , Wang, Lei , Xu, Bo , Zhao, Wenhong , Teng, Shyh , Xia, Feng
- Date: 2020
- Type: Text , Journal article
- Relation: IEEE Access Vol. 8, no. (2020), p. 205600-205617
- Full Text:
- Reviewed:
- Description: Network representation learning (NRL) is an effective graph analytics technique and promotes users to deeply understand the hidden characteristics of graph data. It has been successfully applied in many real-world tasks related to network science, such as social network data processing, biological information processing, and recommender systems. Deep Learning is a powerful tool to learn data features. However, it is non-trivial to generalize deep learning to graph-structured data since it is different from the regular data such as pictures having spatial information and sounds having temporal information. Recently, researchers proposed many deep learning-based methods in the area of NRL. In this survey, we investigate classical NRL from traditional feature learning method to the deep learning-based model, analyze relationships between them, and summarize the latest progress. Finally, we discuss open issues considering NRL and point out the future directions in this field. © 2020 Institute of Electrical and Electronics Engineers Inc.. All rights reserved.
A novel fusion approach in the extraction of kernel descriptor with improved effectiveness and efficiency
- Karmakar, Priyabrata, Teng, Shyh, Lu, Guojun, Zhang, Dengsheng
- Authors: Karmakar, Priyabrata , Teng, Shyh , Lu, Guojun , Zhang, Dengsheng
- Date: 2021
- Type: Text , Journal article
- Relation: Multimedia Tools and Applications Vol. 80, no. 10 (Apr 2021), p. 14545-14564
- Full Text:
- Reviewed:
- Description: Image representation using feature descriptors is crucial. A number of histogram-based descriptors are widely used for this purpose. However, histogram-based descriptors have certain limitations and kernel descriptors (KDES) are proven to overcome them. Moreover, the combination of more than one KDES performs better than an individual KDES. Conventionally, KDES fusion is performed by concatenating them after the gradient, colour and shape descriptors have been extracted. This approach has limitations in regard to the efficiency as well as the effectiveness. In this paper, we propose a novel approach to fuse different image features before the descriptor extraction, resulting in a compact descriptor which is efficient and effective. In addition, we have investigated the effect on the proposed descriptor when texture-based features are fused along with the conventionally used features. Our proposed descriptor is examined on two publicly available image databases and shown to provide outstanding performances.
- Authors: Karmakar, Priyabrata , Teng, Shyh , Lu, Guojun , Zhang, Dengsheng
- Date: 2021
- Type: Text , Journal article
- Relation: Multimedia Tools and Applications Vol. 80, no. 10 (Apr 2021), p. 14545-14564
- Full Text:
- Reviewed:
- Description: Image representation using feature descriptors is crucial. A number of histogram-based descriptors are widely used for this purpose. However, histogram-based descriptors have certain limitations and kernel descriptors (KDES) are proven to overcome them. Moreover, the combination of more than one KDES performs better than an individual KDES. Conventionally, KDES fusion is performed by concatenating them after the gradient, colour and shape descriptors have been extracted. This approach has limitations in regard to the efficiency as well as the effectiveness. In this paper, we propose a novel approach to fuse different image features before the descriptor extraction, resulting in a compact descriptor which is efficient and effective. In addition, we have investigated the effect on the proposed descriptor when texture-based features are fused along with the conventionally used features. Our proposed descriptor is examined on two publicly available image databases and shown to provide outstanding performances.
Adversarial network with multiple classifiers for open set domain adaptation
- Shermin, Tasfia, Lu, Guojun, Teng, Shyh, Murshed, Manzur, Sohel, Ferdous
- Authors: Shermin, Tasfia , Lu, Guojun , Teng, Shyh , Murshed, Manzur , Sohel, Ferdous
- Date: 2021
- Type: Text , Journal article
- Relation: IEEE Transactions on Multimedia Vol. 23, no. (2021), p. 2732-2744
- Full Text:
- Reviewed:
- Description: Domain adaptation aims to transfer knowledge from a domain with adequate labeled samples to a domain with scarce labeled samples. Prior research has introduced various open set domain adaptation settings in the literature to extend the applications of domain adaptation methods in real-world scenarios. This paper focuses on the type of open set domain adaptation setting where the target domain has both private ('unknown classes') label space and the shared ('known classes') label space. However, the source domain only has the 'known classes' label space. Prevalent distribution-matching domain adaptation methods are inadequate in such a setting that demands adaptation from a smaller source domain to a larger and diverse target domain with more classes. For addressing this specific open set domain adaptation setting, prior research introduces a domain adversarial model that uses a fixed threshold for distinguishing known from unknown target samples and lacks at handling negative transfers. We extend their adversarial model and propose a novel adversarial domain adaptation model with multiple auxiliary classifiers. The proposed multi-classifier structure introduces a weighting module that evaluates distinctive domain characteristics for assigning the target samples with weights which are more representative to whether they are likely to belong to the known and unknown classes to encourage positive transfers during adversarial training and simultaneously reduces the domain gap between the shared classes of the source and target domains. A thorough experimental investigation shows that our proposed method outperforms existing domain adaptation methods on a number of domain adaptation datasets. © 1999-2012 IEEE.
- Authors: Shermin, Tasfia , Lu, Guojun , Teng, Shyh , Murshed, Manzur , Sohel, Ferdous
- Date: 2021
- Type: Text , Journal article
- Relation: IEEE Transactions on Multimedia Vol. 23, no. (2021), p. 2732-2744
- Full Text:
- Reviewed:
- Description: Domain adaptation aims to transfer knowledge from a domain with adequate labeled samples to a domain with scarce labeled samples. Prior research has introduced various open set domain adaptation settings in the literature to extend the applications of domain adaptation methods in real-world scenarios. This paper focuses on the type of open set domain adaptation setting where the target domain has both private ('unknown classes') label space and the shared ('known classes') label space. However, the source domain only has the 'known classes' label space. Prevalent distribution-matching domain adaptation methods are inadequate in such a setting that demands adaptation from a smaller source domain to a larger and diverse target domain with more classes. For addressing this specific open set domain adaptation setting, prior research introduces a domain adversarial model that uses a fixed threshold for distinguishing known from unknown target samples and lacks at handling negative transfers. We extend their adversarial model and propose a novel adversarial domain adaptation model with multiple auxiliary classifiers. The proposed multi-classifier structure introduces a weighting module that evaluates distinctive domain characteristics for assigning the target samples with weights which are more representative to whether they are likely to belong to the known and unknown classes to encourage positive transfers during adversarial training and simultaneously reduces the domain gap between the shared classes of the source and target domains. A thorough experimental investigation shows that our proposed method outperforms existing domain adaptation methods on a number of domain adaptation datasets. © 1999-2012 IEEE.
Robust image classification using a low-pass activation function and DCT augmentation
- Hossain, Md Tahmid, Teng, Shyh, Sohel, Ferdous, Lu, Guojun
- Authors: Hossain, Md Tahmid , Teng, Shyh , Sohel, Ferdous , Lu, Guojun
- Date: 2021
- Type: Text , Journal article
- Relation: IEEE Access Vol. 9, no. (2021), p. 86460-86474
- Full Text:
- Reviewed:
- Description: Convolutional Neural Network's (CNN's) performance disparity on clean and corrupted datasets has recently come under scrutiny. In this work, we analyse common corruptions in the frequency domain, i.e., High Frequency corruptions (HFc, e.g., noise) and Low Frequency corruptions (LFc, e.g., blur). Although a simple solution to HFc is low-pass filtering, ReLU - a widely used Activation Function (AF), does not have any filtering mechanism. In this work, we instill low-pass filtering into the AF (LP-ReLU) to improve robustness against HFc. To deal with LFc, we complement LP-ReLU with Discrete Cosine Transform based augmentation. LP-ReLU, coupled with DCT augmentation, enables a deep network to tackle the entire spectrum of corruption. We use CIFAR-10-C and Tiny ImageNet-C for evaluation and demonstrate improvements of 5% and 7.3% in accuracy respectively, compared to the State-Of-The-Art (SOTA). We further evaluate our method's stability on a variety of perturbations in CIFAR-10-P and Tiny ImageNet-P, achieving new SOTA in these experiments as well. To further strengthen our understanding regarding CNN's lack of robustness, a decision space visualisation process is proposed and presented in this work. © 2013 IEEE.
- Authors: Hossain, Md Tahmid , Teng, Shyh , Sohel, Ferdous , Lu, Guojun
- Date: 2021
- Type: Text , Journal article
- Relation: IEEE Access Vol. 9, no. (2021), p. 86460-86474
- Full Text:
- Reviewed:
- Description: Convolutional Neural Network's (CNN's) performance disparity on clean and corrupted datasets has recently come under scrutiny. In this work, we analyse common corruptions in the frequency domain, i.e., High Frequency corruptions (HFc, e.g., noise) and Low Frequency corruptions (LFc, e.g., blur). Although a simple solution to HFc is low-pass filtering, ReLU - a widely used Activation Function (AF), does not have any filtering mechanism. In this work, we instill low-pass filtering into the AF (LP-ReLU) to improve robustness against HFc. To deal with LFc, we complement LP-ReLU with Discrete Cosine Transform based augmentation. LP-ReLU, coupled with DCT augmentation, enables a deep network to tackle the entire spectrum of corruption. We use CIFAR-10-C and Tiny ImageNet-C for evaluation and demonstrate improvements of 5% and 7.3% in accuracy respectively, compared to the State-Of-The-Art (SOTA). We further evaluate our method's stability on a variety of perturbations in CIFAR-10-P and Tiny ImageNet-P, achieving new SOTA in these experiments as well. To further strengthen our understanding regarding CNN's lack of robustness, a decision space visualisation process is proposed and presented in this work. © 2013 IEEE.
Bidirectional mapping coupled GAN for generalized zero-shot learning
- Shermin, Tasfia, Teng, Shyh, Sohel, Ferdous, Murshed, Manzur, Lu, Guojun
- Authors: Shermin, Tasfia , Teng, Shyh , Sohel, Ferdous , Murshed, Manzur , Lu, Guojun
- Date: 2022
- Type: Text , Journal article
- Relation: IEEE Transactions on Image Processing Vol. 31, no. (2022), p. 721-733
- Full Text:
- Reviewed:
- Description: Bidirectional mapping-based generalized zero-shot learning (GZSL) methods rely on the quality of synthesized features to recognize seen and unseen data. Therefore, learning a joint distribution of seen-unseen classes and preserving the distinction between seen-unseen classes is crucial for GZSL methods. However, existing methods only learn the underlying distribution of seen data, although unseen class semantics are available in the GZSL problem setting. Most methods neglect retaining seen-unseen classes distinction and use the learned distribution to recognize seen and unseen data. Consequently, they do not perform well. In this work, we utilize the available unseen class semantics alongside seen class semantics and learn joint distribution through a strong visual-semantic coupling. We propose a bidirectional mapping coupled generative adversarial network (BMCoGAN) by extending the concept of the coupled generative adversarial network into a bidirectional mapping model. We further integrate a Wasserstein generative adversarial optimization to supervise the joint distribution learning. We design a loss optimization for retaining distinctive information of seen-unseen classes in the synthesized features and reducing bias towards seen classes, which pushes synthesized seen features towards real seen features and pulls synthesized unseen features away from real seen features. We evaluate BMCoGAN on benchmark datasets and demonstrate its superior performance against contemporary methods. © 1992-2012 IEEE.
- Authors: Shermin, Tasfia , Teng, Shyh , Sohel, Ferdous , Murshed, Manzur , Lu, Guojun
- Date: 2022
- Type: Text , Journal article
- Relation: IEEE Transactions on Image Processing Vol. 31, no. (2022), p. 721-733
- Full Text:
- Reviewed:
- Description: Bidirectional mapping-based generalized zero-shot learning (GZSL) methods rely on the quality of synthesized features to recognize seen and unseen data. Therefore, learning a joint distribution of seen-unseen classes and preserving the distinction between seen-unseen classes is crucial for GZSL methods. However, existing methods only learn the underlying distribution of seen data, although unseen class semantics are available in the GZSL problem setting. Most methods neglect retaining seen-unseen classes distinction and use the learned distribution to recognize seen and unseen data. Consequently, they do not perform well. In this work, we utilize the available unseen class semantics alongside seen class semantics and learn joint distribution through a strong visual-semantic coupling. We propose a bidirectional mapping coupled generative adversarial network (BMCoGAN) by extending the concept of the coupled generative adversarial network into a bidirectional mapping model. We further integrate a Wasserstein generative adversarial optimization to supervise the joint distribution learning. We design a loss optimization for retaining distinctive information of seen-unseen classes in the synthesized features and reducing bias towards seen classes, which pushes synthesized seen features towards real seen features and pulls synthesized unseen features away from real seen features. We evaluate BMCoGAN on benchmark datasets and demonstrate its superior performance against contemporary methods. © 1992-2012 IEEE.
Integrated generalized zero-shot learning for fine-grained classification
- Shermin, Tasfia, Teng, Shyh, Sohel, Ferdous, Murshed, Manzur, Lu, Guojun
- Authors: Shermin, Tasfia , Teng, Shyh , Sohel, Ferdous , Murshed, Manzur , Lu, Guojun
- Date: 2022
- Type: Text , Journal article
- Relation: Pattern Recognition Vol. 122, no. (2022), p.
- Full Text:
- Reviewed:
- Description: Embedding learning (EL) and feature synthesizing (FS) are two of the popular categories of fine-grained GZSL methods. EL or FS using global features cannot discriminate fine details in the absence of local features. On the other hand, EL or FS methods exploiting local features either neglect direct attribute guidance or global information. Consequently, neither method performs well. In this paper, we propose to explore global and direct attribute-supervised local visual features for both EL and FS categories in an integrated manner for fine-grained GZSL. The proposed integrated network has an EL sub-network and a FS sub-network. Consequently, the proposed integrated network can be tested in two ways. We propose a novel two-step dense attention mechanism to discover attribute-guided local visual features. We introduce new mutual learning between the sub-networks to exploit mutually beneficial information for optimization. Moreover, we propose to compute source-target class similarity based on mutual information and transfer-learn the target classes to reduce bias towards the source domain during testing. We demonstrate that our proposed method outperforms contemporary methods on benchmark datasets. © 2021 Elsevier Ltd
- Authors: Shermin, Tasfia , Teng, Shyh , Sohel, Ferdous , Murshed, Manzur , Lu, Guojun
- Date: 2022
- Type: Text , Journal article
- Relation: Pattern Recognition Vol. 122, no. (2022), p.
- Full Text:
- Reviewed:
- Description: Embedding learning (EL) and feature synthesizing (FS) are two of the popular categories of fine-grained GZSL methods. EL or FS using global features cannot discriminate fine details in the absence of local features. On the other hand, EL or FS methods exploiting local features either neglect direct attribute guidance or global information. Consequently, neither method performs well. In this paper, we propose to explore global and direct attribute-supervised local visual features for both EL and FS categories in an integrated manner for fine-grained GZSL. The proposed integrated network has an EL sub-network and a FS sub-network. Consequently, the proposed integrated network can be tested in two ways. We propose a novel two-step dense attention mechanism to discover attribute-guided local visual features. We introduce new mutual learning between the sub-networks to exploit mutually beneficial information for optimization. Moreover, we propose to compute source-target class similarity based on mutual information and transfer-learn the target classes to reduce bias towards the source domain during testing. We demonstrate that our proposed method outperforms contemporary methods on benchmark datasets. © 2021 Elsevier Ltd