Crop monitoring by multimodal remote sensing : a review
- Authors: Karmakar, Priyabrata , Teng, Shyh , Murshed, Manzur , Pang, Shaoning , Li, Yanyu , Lin, Hao
- Date: 2024
- Type: Text , Journal article , Review
- Relation: Remote Sensing Applications: Society and Environment Vol. 33, no. (2024), p.
- Full Text:
- Reviewed:
- Description: Effective approaches to achieve food safety and security can prevent catastrophic situations. Therefore, it is required to monitor agricultural crops on a regular basis. This can be easily achieved by capturing data from various remote sensing (RS) devices followed by processing them. Most RS devices are useful in monitoring crops and analysing different stages of plant growth successfully. However, individual devices have some limitations. To overcome this, multimodal remote sensing (MRS) methods have been gradually gaining popularity. In the multimodal approach, data from more than one modality are used together to obtain a better outcome. This is because, different modalities of data when used together can complement each other to achieve the same objective by combining their strengths and reducing their limitations, simultaneously. MRS methods have been found to be particularly useful for crop monitoring as they allow for the integration of data from multiple sources, resulting in a more comprehensive understanding of plant growth and development. By using MRS methods, it is possible to obtain a more accurate and detailed analysis of crop conditions, leading to improved decision-making and ultimately, better crop yields. In this paper, we will explore how MRS methods have been successfully utilised in crop monitoring and how the data obtained from these methods can provide valuable insights into the health and development of plants. © 2023 The Authors
A robust local texture descriptor in the parametric space of the weibull distribution
- Authors: Tania, Sheikh , Karmakar, Gour , Teng, Shyh , Murshed, Manzur
- Date: 2023
- Type: Text , Journal article
- Relation: IEEE Transactions on Multimedia Vol. 25, no. (2023), p. 6053-6066
- Full Text: false
- Reviewed:
- Description: Research in texture feature approximation is still in the embryonic stage because of difficulties in developing a sound theoretical model to express the unique pattern in the intensity-variation of pixels in the neighbourhood of the pixel-of-interest so that it can sufficiently discriminate different textures. Local texture descriptors are widely used in image segmentation as they comprise pixel-wise features. The Weber local descriptor (WLD) with differential excitation and gradient orientation components, inspired by Weber's Law, has been leveraged in the state-of-the-art iterative contraction and merging (ICM) image segmentation technique. However, WLD has inherent drawbacks in the formulation of the components that limit its discriminatory capability. This paper introduces a novel texture descriptor by directly modelling the distribution of intensity-variation in the parametric space of the Weibull distribution using its shape and scale parameters. A unified 'joint scale' texture property is introduced, which can discriminate textures better than the individual parameters while keeping the length of the descriptor shorter. Additionally, the accuracy of WLD's gradient orientation component is improved by using an extended Sobel operator and expressing gradients in -
Anti-aliasing deep image classifiers using novel depth adaptive blurring and activation function
- Authors: Hossain, Md Tahmid , Teng, Shyh , Lu, Guojun , Rahman, Mohammad Arifur , Sohel, Ferdous
- Date: 2023
- Type: Text , Journal article
- Relation: Neurocomputing Vol. 536, no. (2023), p. 164-174
- Full Text: false
- Reviewed:
- Description: Deep convolutional networks are vulnerable to image translation or shift, partly due to common down-sampling layers, e.g., max-pooling and strided convolution. These operations violate the Nyquist sampling rate and cause aliasing. The textbook solution is low-pass filtering (blurring) before down-sampling, which can benefit deep networks as well. Even so, non-linearity units, such as ReLU, often re-introduce the problem, suggesting that blurring alone may not suffice. In this work, first, we analyse deep features with Fourier transform and show that Depth Adaptive Blurring is more effective, as opposed to monotonic blurring. To this end, we propose a novel Depth Adaptive Blur-pool (DAB-pool) module to replace existing down-sampling methods. Second, we introduce a novel activation function – with a built-in low pass filter, as an additional measure, to keep the problem from reappearing. From experiments, we observe generalisation on other forms of transformations and corruptions as well, e.g., rotation, scale, and noise. We evaluate our method under three challenging settings: (1) a variety of image translations; (2) adversarial attacks – both
Comparative analysis of machine and deep learning models for soil properties prediction from hyperspectral visual band
- Authors: Datta, Dristi , Paul, Manoranjan , Murshed, Manzur , Teng, Shyh , Schmidtke, Leigh
- Date: 2023
- Type: Text , Journal article
- Relation: Environments Vol. 10, no. 5 (2023), p. 77
- Full Text:
- Reviewed:
- Description: Estimating various properties of soil, including moisture, carbon, and nitrogen, is crucial for studying their correlation with plant health and food production. However, conventional methods such as oven-drying and chemical analysis are laborious, expensive, and only feasible for a limited land area. With the advent of remote sensing technologies like multi/hyperspectral imaging, it is now possible to predict soil properties non-invasive and cost-effectively for a large expanse of bare land. Recent research shows the possibility of predicting those soil contents from a wide range of hyperspectral data using good prediction algorithms. However, these kinds of hyperspectral sensors are expensive and not widely available. Therefore, this paper investigates different machine and deep learning techniques to predict soil nutrient properties using only the red (R), green (G), and blue (B) bands data to propose a suitable machine/deep learning model that can be used as a rapid soil test. Another objective of this research is to observe and compare the prediction accuracy in three cases i. hyperspectral band ii. full spectrum of the visual band, and iii. three-channel of RGB band and provide a guideline to the user on which spectrum information they should use to predict those soil properties. The outcome of this research helps to develop a mobile application that is easy to use for a quick soil test. This research also explores learning-based algorithms with significant feature combinations and their performance comparisons in predicting soil properties from visual band data. For this, we also explore the impact of dimensional reduction (i.e., principal component analysis) and transformations (i.e., empirical mode decomposition) of features. The results show that the proposed model can comparably predict the soil contents from the three-channel RGB data.
MCSNet+ : enhanced convolutional neural network for detection and classification of tribolium and sitophilus sibling species in actual wheat storage environments
- Authors: Yang, Haiying , Li, Yanyu , Xin, Liyong , Teng, Shyh , Pang, Shaoning , Zhao, Huiyi , Cao, Yang , Zhou, Xiaoguang
- Date: 2023
- Type: Text , Journal article
- Relation: Foods Vol. 12, no. 19 (2023), p.
- Full Text:
- Reviewed:
- Description: Insect pests like Tribolium and Sitophilus siblings are major threats to grain storage and processing, causing quality and quantity losses that endanger food security. These closely related species, having very similar morphological and biological characteristics, often exhibit variations in biology and pesticide resistance, complicating control efforts. Accurate pest species identification is essential for effective control, but workplace safety in the grain bin associated with grain deterioration, clumping, fumigator hazards, and air quality create challenges. Therefore, there is a pressing need for an online automated detection system. In this work, we enriched the stored-grain pest sibling image dataset, which includes 25,032 annotated Tribolium samples of two species and five geographical strains from real warehouse and another 1774 from the lab. As previously demonstrated on the Sitophilus family, Convolutional Neural Networks demonstrate distinct advantages over other model architectures in detecting Tribolium. Our CNN model, MCSNet+, integrates Soft-NMS for better recall in dense object detection, a Position-Sensitive Prediction Model to handle translation issues, and anchor parameter fine-tuning for improved matching and speed. This approach significantly enhances mean Average Precision (mAP) for Sitophilus and Tribolium, reaching a minimum of 92.67 ± 1.74% and 94.27 ± 1.02%, respectively. Moreover, MCSNet+ exhibits significant improvements in prediction speed, advancing from 0.055 s/img to 0.133 s/img, and elevates the recognition rates of moving insect sibling species in real wheat storage and visible light, rising from 2.32% to 2.53%. The detection performance of the model on laboratory-captured images surpasses that of real storage facilities, with better results for Tribolium compared to Sitophilus. Although inter-strain variances are less pronounced, the model achieves acceptable detection results across different Tribolium geographical strains, with a minimum recognition rate of 82.64 ± 1.27%. In real-time monitoring videos of grain storage facilities with wheat backgrounds, the enhanced deep learning model based on Convolutional Neural Networks successfully detects and identifies closely related stored-grain pest images. This achievement provides a viable solution for establishing an online pest management system in real storage facilities. © 2023 by the authors.
Auto-identification of two Sitophilus sibling species on stored wheat using deep convolutional neural network
- Authors: Yang, Haiying , Zhao, Huiyi , Zhang, Dufeng , Cao, Yang , Teng, Shyh , Pang, Shaoning , Zhou, Xiaoguang , Li, Yanyu
- Date: 2022
- Type: Text , Journal article
- Relation: Pest Management Science Vol. 78, no. 5 (2022), p. 1925-1937
- Full Text: false
- Reviewed:
- Description: BACKGROUND: Sitophilus oryzae and Sitophilus zeamais are the two main insect pests that infest stored grain worldwide. Accurate and rapid identification of the two pests is challenging because of their similar appearances. The S. zeamais adults are darker and shinier than S. oryzae in visible light. Convolutional neural network (CNN) can be applied for the effective differentiation due to its high effectiveness in object recognition. RESULTS: We propose a multilayer convolutional structure (MCS) feature extractor to extract insect characteristics within each layer of the CNN architecture. A region proposal network is adopted to determine the location of a potential pest in the wheat background. The precision of classification and the robustness of bounding box regression are increased by including deeper layer variables into the classification and bounding box regression subnets, as well as combining loss functions softmax and smooth L1. The proposed multilayer convolutional structure network (MCSNet) achieves the mean average precision of 87.89 ± 2.36% from the laboratory test, with an average detection speed of 0.182 ± 0.005 s per test. The model was further assessed with the field trials, and the obtained accuracy was 90.35 ± 3.12%. For all test conditions, the average precision for S. oryzae was higher than that for S. zeamais. CONCLUSION: The proposed MCSNet model has demonstrated that it is a fast and accurate method for detecting sibling species from visible light images in both laboratory and field trials. This will ultimately be applied for pest management together with an upgraded industrial camera system, which has been installed in over 100 000 grain depots of China. © 2022 Society of Chemical Industry.
Bidirectional mapping coupled GAN for generalized zero-shot learning
- Authors: Shermin, Tasfia , Teng, Shyh , Sohel, Ferdous , Murshed, Manzur , Lu, Guojun
- Date: 2022
- Type: Text , Journal article
- Relation: IEEE Transactions on Image Processing Vol. 31, no. (2022), p. 721-733
- Full Text:
- Reviewed:
- Description: Bidirectional mapping-based generalized zero-shot learning (GZSL) methods rely on the quality of synthesized features to recognize seen and unseen data. Therefore, learning a joint distribution of seen-unseen classes and preserving the distinction between seen-unseen classes is crucial for GZSL methods. However, existing methods only learn the underlying distribution of seen data, although unseen class semantics are available in the GZSL problem setting. Most methods neglect retaining seen-unseen classes distinction and use the learned distribution to recognize seen and unseen data. Consequently, they do not perform well. In this work, we utilize the available unseen class semantics alongside seen class semantics and learn joint distribution through a strong visual-semantic coupling. We propose a bidirectional mapping coupled generative adversarial network (BMCoGAN) by extending the concept of the coupled generative adversarial network into a bidirectional mapping model. We further integrate a Wasserstein generative adversarial optimization to supervise the joint distribution learning. We design a loss optimization for retaining distinctive information of seen-unseen classes in the synthesized features and reducing bias towards seen classes, which pushes synthesized seen features towards real seen features and pulls synthesized unseen features away from real seen features. We evaluate BMCoGAN on benchmark datasets and demonstrate its superior performance against contemporary methods. © 1992-2012 IEEE.
Human pose based video compression via forward-referencing using deep learning
- Authors: Rajin, S.M. Ataul Karim , Murshed, Manzur , Paul, Manoranjan , Teng, Shyh , Ma, Jiangang
- Date: 2022
- Type: Text , Conference paper
- Relation: 2022 IEEE International Conference on Visual Communications and Image Processing, VCIP 2022, Suzhou, China,13-16 December 2022, 2022 IEEE International Conference on Visual Communications and Image Processing, VCIP 2022
- Full Text: false
- Reviewed:
- Description: To exploit high temporal correlations in video frames of the same scene, the current frame is predicted from the already-encoded reference frames using block-based motion estimation and compensation techniques. While this approach can efficiently exploit the translation motion of the moving objects, it is susceptible to other types of affine motion and object occlusion/deocclusion. Recently, deep learning has been used to model the high-level structure of human pose in specific actions from short videos and then generate virtual frames in future time by predicting the pose using a generative adversarial network (GAN). Therefore, modelling the high-level structure of human pose is able to exploit semantic correlation by predicting human actions and determining its trajectory. Video surveillance applications will benefit as stored 'big' surveillance data can be compressed by estimating human pose trajectories and generating future frames through semantic correlation. This paper explores a new way of video coding by modelling human pose from the already-encoded frames and using the generated frame at the current time as an additional forward-referencing frame. It is expected that the proposed approach can overcome the limitations of the traditional backward-referencing frames by predicting the blocks containing the moving objects with lower residuals. Our experimental results show that the proposed approach can achieve on average up to 2.83 dB PSNR gain and 25.93% bitrate savings for high motion video sequences compared to standard video coding. © 2022 IEEE.
Integrated generalized zero-shot learning for fine-grained classification
- Authors: Shermin, Tasfia , Teng, Shyh , Sohel, Ferdous , Murshed, Manzur , Lu, Guojun
- Date: 2022
- Type: Text , Journal article
- Relation: Pattern Recognition Vol. 122, no. (2022), p.
- Full Text:
- Reviewed:
- Description: Embedding learning (EL) and feature synthesizing (FS) are two of the popular categories of fine-grained GZSL methods. EL or FS using global features cannot discriminate fine details in the absence of local features. On the other hand, EL or FS methods exploiting local features either neglect direct attribute guidance or global information. Consequently, neither method performs well. In this paper, we propose to explore global and direct attribute-supervised local visual features for both EL and FS categories in an integrated manner for fine-grained GZSL. The proposed integrated network has an EL sub-network and a FS sub-network. Consequently, the proposed integrated network can be tested in two ways. We propose a novel two-step dense attention mechanism to discover attribute-guided local visual features. We introduce new mutual learning between the sub-networks to exploit mutually beneficial information for optimization. Moreover, we propose to compute source-target class similarity based on mutual information and transfer-learn the target classes to reduce bias towards the source domain during testing. We demonstrate that our proposed method outperforms contemporary methods on benchmark datasets. © 2021 Elsevier Ltd
Soil moisture, organic carbon, and nitrogen content prediction with hyperspectral data using regression models
- Authors: Datta, Dristi , Paul, Manoranjan , Murshed, Manzur , Teng, Shyh , Schmidtke, Leigh
- Date: 2022
- Type: Text , Journal article
- Relation: Sensors (Basel, Switzerland) Vol. 22, no. 20 (2022), p.
- Full Text:
- Reviewed:
- Description: Soil moisture, soil organic carbon, and nitrogen content prediction are considered significant fields of study as they are directly related to plant health and food production. Direct estimation of these soil properties with traditional methods, for example, the oven-drying technique and chemical analysis, is a time and resource-consuming approach and can predict only smaller areas. With the significant development of remote sensing and hyperspectral (HS) imaging technologies, soil moisture, carbon, and nitrogen can be estimated over vast areas. This paper presents a generalized approach to predicting three different essential soil contents using a comprehensive study of various machine learning (ML) models by considering the dimensional reduction in feature spaces. In this study, we have used three popular benchmark HS datasets captured in Germany and Sweden. The efficacy of different ML algorithms is evaluated to predict soil content, and significant improvement is obtained when a specific range of bands is selected. The performance of ML models is further improved by applying principal component analysis (PCA), a dimensional reduction method that works with an unsupervised learning method. The effect of soil temperature on soil moisture prediction is evaluated in this study, and the results show that when the soil temperature is considered with the HS band, the soil moisture prediction accuracy does not improve. However, the combined effect of band selection and feature transformation using PCA significantly enhances the prediction accuracy for soil moisture, carbon, and nitrogen content. This study represents a comprehensive analysis of a wide range of established ML regression models using data preprocessing, effective band selection, and data dimension reduction and attempt to understand which feature combinations provide the best accuracy. The outcomes of several ML models are verified with validation techniques and the best- and worst-case scenarios in terms of soil content are noted. The proposed approach outperforms existing estimation techniques.
A novel fusion approach in the extraction of kernel descriptor with improved effectiveness and efficiency
- Authors: Karmakar, Priyabrata , Teng, Shyh , Lu, Guojun , Zhang, Dengsheng
- Date: 2021
- Type: Text , Journal article
- Relation: Multimedia Tools and Applications Vol. 80, no. 10 (Apr 2021), p. 14545-14564
- Full Text:
- Reviewed:
- Description: Image representation using feature descriptors is crucial. A number of histogram-based descriptors are widely used for this purpose. However, histogram-based descriptors have certain limitations and kernel descriptors (KDES) are proven to overcome them. Moreover, the combination of more than one KDES performs better than an individual KDES. Conventionally, KDES fusion is performed by concatenating them after the gradient, colour and shape descriptors have been extracted. This approach has limitations in regard to the efficiency as well as the effectiveness. In this paper, we propose a novel approach to fuse different image features before the descriptor extraction, resulting in a compact descriptor which is efficient and effective. In addition, we have investigated the effect on the proposed descriptor when texture-based features are fused along with the conventionally used features. Our proposed descriptor is examined on two publicly available image databases and shown to provide outstanding performances.
Adversarial network with multiple classifiers for open set domain adaptation
- Authors: Shermin, Tasfia , Lu, Guojun , Teng, Shyh , Murshed, Manzur , Sohel, Ferdous
- Date: 2021
- Type: Text , Journal article
- Relation: IEEE Transactions on Multimedia Vol. 23, no. (2021), p. 2732-2744
- Full Text:
- Reviewed:
- Description: Domain adaptation aims to transfer knowledge from a domain with adequate labeled samples to a domain with scarce labeled samples. Prior research has introduced various open set domain adaptation settings in the literature to extend the applications of domain adaptation methods in real-world scenarios. This paper focuses on the type of open set domain adaptation setting where the target domain has both private ('unknown classes') label space and the shared ('known classes') label space. However, the source domain only has the 'known classes' label space. Prevalent distribution-matching domain adaptation methods are inadequate in such a setting that demands adaptation from a smaller source domain to a larger and diverse target domain with more classes. For addressing this specific open set domain adaptation setting, prior research introduces a domain adversarial model that uses a fixed threshold for distinguishing known from unknown target samples and lacks at handling negative transfers. We extend their adversarial model and propose a novel adversarial domain adaptation model with multiple auxiliary classifiers. The proposed multi-classifier structure introduces a weighting module that evaluates distinctive domain characteristics for assigning the target samples with weights which are more representative to whether they are likely to belong to the known and unknown classes to encourage positive transfers during adversarial training and simultaneously reduces the domain gap between the shared classes of the source and target domains. A thorough experimental investigation shows that our proposed method outperforms existing domain adaptation methods on a number of domain adaptation datasets. © 1999-2012 IEEE.
Integrating line weber local descriptor and deep feature for tire indentation mark image classification
- Authors: Liu, Ying , Che, Xin , Dong, Haitao , Li, Daxiang , Teng, Shyh , Lu, Guojun
- Date: 2021
- Type: Text , Conference paper
- Relation: 4th International Conference on Artificial Intelligence and Pattern Recognition, 4th International Conference on Artificial Intelligence and Pattern Recognition, AIPR 2021,Virtual, Online,17-19 September 2021, 2021, ACM International Conference Proceeding Series p. 56-61
- Full Text: false
- Reviewed:
- Description: Tire indentation mark matching is an essential tool used for the investigation of criminal cases and traffic incidents. As such images are unique and uncommon, there is a lack of dedicated databases and relevant research on this topic. This paper presents a feature extraction algorithm effective for tire indentation mark image description. The main contributions include: (1) Line feature Weber local descriptor (LWLD) is proposed, which uses the Gabor orientations instead of the original gradient orientation. This feature can describe texture information of tire indentation mark image more efficiently. (2) An attention model is constructed to produce attention feature map of tire indentation mark image. This attention feature map is then fused with LWLD resulting in a feature with more powerful representation capability. Experimental results prove that the combined use of LWLD and attention model greatly enhances the performance of tire indentation mark image matching tasks. © 2021 ACM.
Rice freshness identification based on visible near-infrared spectroscopy and colorimetric sensor array
- Authors: Lin, Hao , Jiang, Hao , Lin, Jinjin , Chen, Quansheng , Ali, Shujat , Teng, Shyh , Zuo, Min
- Date: 2021
- Type: Text , Journal article
- Relation: Food Analytical Methods Vol. 14, no. 7 (2021), p. 1305-1314
- Full Text: false
- Reviewed:
- Description: In this work, two methods, which were visible near-infrared spectroscopy (VNIRS) and visible near-infrared spectroscopy combined with colorimetric sensor array (VNIRS-CSA), were used to identify the volatile compound changes of rice samples stored for 0 to 6 months. Principal component analysis (PCA), interval partial least squares (iPLS), and synergy interval partial least squares (SiPLS) were used for qualitative classification. A prediction model was established by linear discriminant analysis (LDA), which was compared with the traditional VNIRS detection technology. The results revealed that the VNIRS-CSA got better performance than VNIRS and exhibited a good result based on iPLS/SiPLS-PCA/LDA models. Furthermore, spectral data from VNIRS-CSA were the best for LDA with a high prediction value of 0.925 after standard normal variate (SNV) processing and variable selection by SiPLS. The research demonstrated that VNIRS-CSA is a quick, accurate, and non-destructive method for monitoring the storage time of rice. The strategy also has the potential for volatile organic components analysis. © 2021, The Author(s), under exclusive licence to Springer Science+Business Media, LLC part of Springer Nature.
Robust image classification using a low-pass activation function and DCT augmentation
- Authors: Hossain, Md Tahmid , Teng, Shyh , Sohel, Ferdous , Lu, Guojun
- Date: 2021
- Type: Text , Journal article
- Relation: IEEE Access Vol. 9, no. (2021), p. 86460-86474
- Full Text:
- Reviewed:
- Description: Convolutional Neural Network's (CNN's) performance disparity on clean and corrupted datasets has recently come under scrutiny. In this work, we analyse common corruptions in the frequency domain, i.e., High Frequency corruptions (HFc, e.g., noise) and Low Frequency corruptions (LFc, e.g., blur). Although a simple solution to HFc is low-pass filtering, ReLU - a widely used Activation Function (AF), does not have any filtering mechanism. In this work, we instill low-pass filtering into the AF (LP-ReLU) to improve robustness against HFc. To deal with LFc, we complement LP-ReLU with Discrete Cosine Transform based augmentation. LP-ReLU, coupled with DCT augmentation, enables a deep network to tackle the entire spectrum of corruption. We use CIFAR-10-C and Tiny ImageNet-C for evaluation and demonstrate improvements of 5% and 7.3% in accuracy respectively, compared to the State-Of-The-Art (SOTA). We further evaluate our method's stability on a variety of perturbations in CIFAR-10-P and Tiny ImageNet-P, achieving new SOTA in these experiments as well. To further strengthen our understanding regarding CNN's lack of robustness, a decision space visualisation process is proposed and presented in this work. © 2013 IEEE.
An Enhanced Local Texture Descriptor for Image Segmentation
- Authors: Tania, Sheikh , Murshed, Manzur , Teng, Shyh , Karmakar, Gour
- Date: 2020
- Type: Text , Conference paper
- Relation: 2020 IEEE International Conference on Image Processing, ICIP 2020 Vol. 2020-October, p. 1526-1530
- Full Text: false
- Reviewed:
- Description: Texture is an indispensable property to develop many vision based autonomous applications. Compared to colour, feature dimension in a local texture descriptor is quite large as dense texture features need to represent the distribution of pixel intensities in the neighbourhood of each pixel. Large dimensional features require additional time for further processing that often restrict real-time applications. In this paper, a robust local texture descriptor is enhanced by reducing feature dimension by three folds without compromising the accuracy in region-based image segmentation applications. Reduction in feature dimension is achieved by exploiting the mean of neighbourhood pixel intensities radially along lines across a certain radius, which eliminates the need for sampling intensity distribution at three scales. Both the results of benchmark metrics and computational time are promising when the enhanced texture feature is used in a region-based hierarchical segmentation algorithm, a recent state-of-the-art technique. © 2020 IEEE.
An enhancement to the spatial pyramid matching for image classification and retrieval
- Authors: Karmakar, Priyabrata , Teng, Shyh , Lu, Guojun , Zhang, Dengsheng
- Date: 2020
- Type: Text , Journal article
- Relation: IEEE Access Vol. 8, no. (2020), p. 22463-22472
- Full Text:
- Reviewed:
- Description: Spatial pyramid matching (SPM) is one of the widely used methods to incorporate spatial information into the image representation. Despite its effectiveness, the traditional SPM is not rotation invariant. A rotation invariant SPM has been proposed in the literature but it has many limitations regarding the effectiveness. In this paper, we investigate how to make SPM robust to rotation by addressing those limitations. In an SPM framework, an image is divided into an increasing number of partitions at different pyramid levels. In this paper, our main focus is on how to partition images in such a way that the resulting structure can deal with image-level rotations. To do that, we investigate three concentric ring partitioning schemes. Apart from image partitioning, another important component of the SPM framework is a weight function. To apportion the contribution of each pyramid level to the final matching between two images, the weight function is needed. In this paper, we propose a new weight function which is suitable for the rotation-invariant SPM structure. Experiments based on image classification and retrieval are performed on five image databases. The detailed result analysis shows that we are successful in enhancing the effectiveness of SPM for image classification and retrieval. © 2013 IEEE.
Learning large margin multiple granularity features with an improved siamese network for person re-identification
- Authors: Li, Da-Xiang , Fei, Gy , Teng, Shyh
- Date: 2020
- Type: Text , Journal article
- Relation: Symmetry-Basel Vol. 12, no. 1 (Jan 2020), p. 16
- Full Text:
- Reviewed:
- Description: Person re-identification (Re-ID) is a non-overlapping multi-camera retrieval task to match different images of the same person, and it has become a hot research topic in many fields, such as surveillance security, criminal investigation, and video analysis. As one kind of important architecture for person re-identification, Siamese networks usually adopt standard softmax loss function, and they can only obtain the global features of person images, ignoring the local features and the large margin for classification. In this paper, we design a novel symmetric Siamese network model named Siamese Multiple Granularity Network (SMGN), which can jointly learn the large margin multiple granularity features and similarity metrics for person re-identification. Firstly, two branches for global and local feature extraction are designed in the backbone of the proposed SMGN model, and the extracted features are concatenated together as multiple granularity features of person images. Then, to enhance their discriminating ability, the multiple channel weighted fusion (MCWF) loss function is constructed for the SMGN model, which includes the verification loss and identification loss of the training image pair. Extensive comparative experiments on four benchmark datasets (CUHK01, CUHK03, Market-1501 and DukeMTMC-reID) show the effectiveness of our proposed method and its performance outperforms many state-of-the-art methods.
Network representation learning: From traditional feature learning to deep learning
- Authors: Sun, Ke , Wang, Lei , Xu, Bo , Zhao, Wenhong , Teng, Shyh , Xia, Feng
- Date: 2020
- Type: Text , Journal article
- Relation: IEEE Access Vol. 8, no. (2020), p. 205600-205617
- Full Text:
- Reviewed:
- Description: Network representation learning (NRL) is an effective graph analytics technique and promotes users to deeply understand the hidden characteristics of graph data. It has been successfully applied in many real-world tasks related to network science, such as social network data processing, biological information processing, and recommender systems. Deep Learning is a powerful tool to learn data features. However, it is non-trivial to generalize deep learning to graph-structured data since it is different from the regular data such as pictures having spatial information and sounds having temporal information. Recently, researchers proposed many deep learning-based methods in the area of NRL. In this survey, we investigate classical NRL from traditional feature learning method to the deep learning-based model, analyze relationships between them, and summarize the latest progress. Finally, we discuss open issues considering NRL and point out the future directions in this field. © 2020 Institute of Electrical and Electronics Engineers Inc.. All rights reserved.
Rectified softmax loss with all-sided cost sensitivity for age estimation
- Authors: Li, Daxiang , Ma, Xuan , Ren, Yaqiong , Teng, Shyh
- Date: 2020
- Type: Text , Journal article
- Relation: IEEE Access Vol. 8, no. (2020), p. 32551-32563
- Full Text:
- Reviewed:
- Description: In Convolutional Neural Network (ConvNet) based age estimation algorithms, softmax loss is usually chosen as the loss function directly, and the problems of Cost Sensitivity (CS), such as class imbalance and misclassification cost difference between different classes, are not considered. Focus on these problems, this paper constructs a rectified softmax loss function with all-sided CS, and proposes a novel cost-sensitive ConvNet based age estimation algorithm. Firstly, a loss function is established for each age category to solve the imbalance of the number of training samples. Then, a cost matrix is defined to reflect the cost difference caused by misclassification between different classes, thus constructing a new cost-sensitive error function. Finally, the above methods are merged to construct a rectified softmax loss function for ConvNet model, and a corresponding Back Propagation (BP) training scheme is designed to enable ConvNet network to learn robust face representation for age estimation during the training phase. Simultaneously, the rectified softmax loss is theoretically proved that it satisfies the general conditions of the loss function used for classification. The effectiveness of the proposed method is verified by experiments on face image datasets of different races. © 2013 IEEE.