Integrated generalized zero-shot learning for fine-grained classification
- Authors: Shermin, Tasfia , Teng, Shyh , Sohel, Ferdous , Murshed, Manzur , Lu, Guojun
- Date: 2022
- Type: Text , Journal article
- Relation: Pattern Recognition Vol. 122, no. (2022), p.
- Full Text:
- Reviewed:
- Description: Embedding learning (EL) and feature synthesizing (FS) are two of the popular categories of fine-grained GZSL methods. EL or FS using global features cannot discriminate fine details in the absence of local features. On the other hand, EL or FS methods exploiting local features either neglect direct attribute guidance or global information. Consequently, neither method performs well. In this paper, we propose to explore global and direct attribute-supervised local visual features for both EL and FS categories in an integrated manner for fine-grained GZSL. The proposed integrated network has an EL sub-network and a FS sub-network. Consequently, the proposed integrated network can be tested in two ways. We propose a novel two-step dense attention mechanism to discover attribute-guided local visual features. We introduce new mutual learning between the sub-networks to exploit mutually beneficial information for optimization. Moreover, we propose to compute source-target class similarity based on mutual information and transfer-learn the target classes to reduce bias towards the source domain during testing. We demonstrate that our proposed method outperforms contemporary methods on benchmark datasets. © 2021 Elsevier Ltd
Depth sequence coding with hierarchical partitioning and spatial-domain quantization
- Authors: Shahriyar, Shampa , Murshed, Manzur , Ali, Mortuza , Paul, Manoranjan
- Date: 2020
- Type: Text , Journal article
- Relation: IEEE Transactions on Circuits and Systems for Video Technology Vol. 30, no. 3 (2020), p. 835-849
- Full Text:
- Reviewed:
- Description: Depth coding in 3D-HEVC deforms object shapes due to block-level edge-approximation and lacks efficient techniques to exploit the statistical redundancy, due to the frame-level clustering tendency in depth data, for higher coding gain at near-lossless quality. This paper presents a standalone mono-view depth sequence coder, which preserves edges implicitly by limiting quantization to the spatial-domain and exploits the frame-level clustering tendency efficiently with a novel binary tree-based decomposition (BTBD) technique. The BTBD can exploit the statistical redundancy in frame-level syntax, motion components, and residuals efficiently with fewer block-level prediction/coding modes and simpler context modeling for context-adaptive arithmetic coding. Compared with the depth coder in 3D-HEVC, the proposed one has achieved significantly lower bitrate at lossless to near-lossless quality range for mono-view coding and rendered superior quality synthetic views from the depth maps, compressed at the same bitrate, and the corresponding texture frames. © 1991-2012 IEEE.
Perception-inspired background subtraction
- Authors: Haque, Mahfuzul , Murshed, Manzur
- Date: 2013
- Type: Text , Journal article
- Relation: IEEE Transactions on Circuits and Systems for Video Technology Vol. 23, no. 12 (2013 2013), p. 2127-2140
- Full Text: false
- Reviewed:
- Description: Developing universal and context-invariant methods is one of the hardest challenges in computer vision. Background subtraction (BS), an essential precursor in most machine vision applications used for foreground detection, is no exception. Due to overreliance on statistical observations, most BS techniques show unpredictable behavior in dynamic unconstrained scenarios in which the characteristics of the operating environment are either unknown or change drastically. To achieve superior foreground detection quality across unconstrained scenarios, we propose a new technique, called perception-inspired background subtraction (PBS), which avoids overreliance on statistical observations by making key modeling decisions based on the characteristics of human visual perception. PBS exploits the human perception-inspired confidence interval to associate an observed intensity value with another intensity value during both model learning and background-foreground classification. The concept of perception-inspired confidence interval is also used for identifying redundant samples, thus ensuring the optimal number of samples in the background model. Furthermore, PBS dynamically varies the model adaptation speed (learning rate) at pixel level based on observed scene dynamics to ensure faster adaptation of changed background regions, as well as longer retention of stationary foregrounds. Extensive experimental evaluations on a wide range of benchmark datasets validate the efficacy of PBS compared to the state of the art for unconstraint video analytics.
Video coding focusing on block partitioning and occlusion
- Authors: Paul, Manoranjan , Murshed, Manzur
- Date: 2010
- Type: Text , Journal article
- Relation: IEEE Transactions on Image Processing Vol. 19, no. 3 (2010), p. 691-701
- Full Text: false
- Reviewed:
- Description: Among the existing block partitioning schemes, the pattern-based video coding (PVC) has already established its superiority at low bit-rate. Its innovative segmentation process with regular-shaped pattern templates is very fast as it avoids handling the exact shape of the moving objects. It also judiciously encodes the pattern-uncovered background segments capturing high level of interblock temporal redundancy without any motion compensation, which is favoured by the rate-distortion optimizer at low bit-rates. The existing PVC technique, however, uses a number of content-sensitive thresholds and thus setting them to any predefined values risks ignoring some of the macroblocks that would otherwise be encoded with patterns. Furthermore, occluded background can potentially degrade the performance of this technique. In this paper, a robust PVC scheme is proposed by removing all the content-sensitive thresholds, introducing a new similarity metric, considering multiple top-ranked patterns by the rate-distortion optimizer, and refining the Lagrangian multiplier of the H.264 standard for efficient embedding. A novel pattern-based residual encoding approach is also integrated to address the occlusion issue. Once embedded into the H.264 Baseline profile, the proposed PVC scheme improves the image quality perceptually significantly by at least 0.5 dB in low bit-rate video coding applications. A similar trend is observed for moderate to high bit-rate applications when the proposed scheme replaces the bi-directional predictive mode in the H.264 High profile.
Detection of multiple dynamic textures using feature space mapping
- Authors: Rahman, Ashfaqur , Murshed, Manzur
- Date: 2009
- Type: Text , Journal article
- Relation: IEEE Transactions on Circuits and Systems for Video Technology Vol. 19, no. 5 (2009), p. 766-771
- Full Text: false
- Reviewed:
- Description: Abstract— Image sequences of smoke, fire, etc. are known as dynamic textures. Research is mostly limited to characterization of single dynamic textures. In this paper we address the problem of detecting the presence of multiple dynamic textures in an image sequence by establishing a correspondence between the feature space of dynamic textures and that of their mixture in an image sequence. Accuracy of our proposed technique is both analytically and empirically established with detection experiments yielding 92.5% average accuracy on a diverse set of dynamic texture mixtures in synthetically generated as well as real-world image sequences.