Integrated generalized zero-shot learning for fine-grained classification
- Authors: Shermin, Tasfia , Teng, Shyh , Sohel, Ferdous , Murshed, Manzur , Lu, Guojun
- Date: 2022
- Type: Text , Journal article
- Relation: Pattern Recognition Vol. 122, no. (2022), p.
- Full Text:
- Reviewed:
- Description: Embedding learning (EL) and feature synthesizing (FS) are two of the popular categories of fine-grained GZSL methods. EL or FS using global features cannot discriminate fine details in the absence of local features. On the other hand, EL or FS methods exploiting local features either neglect direct attribute guidance or global information. Consequently, neither method performs well. In this paper, we propose to explore global and direct attribute-supervised local visual features for both EL and FS categories in an integrated manner for fine-grained GZSL. The proposed integrated network has an EL sub-network and a FS sub-network. Consequently, the proposed integrated network can be tested in two ways. We propose a novel two-step dense attention mechanism to discover attribute-guided local visual features. We introduce new mutual learning between the sub-networks to exploit mutually beneficial information for optimization. Moreover, we propose to compute source-target class similarity based on mutual information and transfer-learn the target classes to reduce bias towards the source domain during testing. We demonstrate that our proposed method outperforms contemporary methods on benchmark datasets. © 2021 Elsevier Ltd
Depth sequence coding with hierarchical partitioning and spatial-domain quantization
- Authors: Shahriyar, Shampa , Murshed, Manzur , Ali, Mortuza , Paul, Manoranjan
- Date: 2020
- Type: Text , Journal article
- Relation: IEEE Transactions on Circuits and Systems for Video Technology Vol. 30, no. 3 (2020), p. 835-849
- Full Text:
- Reviewed:
- Description: Depth coding in 3D-HEVC deforms object shapes due to block-level edge-approximation and lacks efficient techniques to exploit the statistical redundancy, due to the frame-level clustering tendency in depth data, for higher coding gain at near-lossless quality. This paper presents a standalone mono-view depth sequence coder, which preserves edges implicitly by limiting quantization to the spatial-domain and exploits the frame-level clustering tendency efficiently with a novel binary tree-based decomposition (BTBD) technique. The BTBD can exploit the statistical redundancy in frame-level syntax, motion components, and residuals efficiently with fewer block-level prediction/coding modes and simpler context modeling for context-adaptive arithmetic coding. Compared with the depth coder in 3D-HEVC, the proposed one has achieved significantly lower bitrate at lossless to near-lossless quality range for mono-view coding and rendered superior quality synthetic views from the depth maps, compressed at the same bitrate, and the corresponding texture frames. © 1991-2012 IEEE.
An efficient RANSAC hypothesis evaluation using sufficient statistics for RGB-D pose estimation
- Authors: Senthooran, Ilankalkone , Murshed, Manzur , Barca, Jan , Kamruzzaman, Joarder , Chung, Hoam
- Date: 2019
- Type: Text , Journal article
- Relation: Autonomous Robots Vol. 43, no. 5 (2019), p. 1257-1270
- Full Text:
- Reviewed:
- Description: Achieving autonomous flight in GPS-denied environments begins with pose estimation in three-dimensional space, and this is much more challenging in an MAV in a swarm robotic system due to limited computational resources. In vision-based pose estimation, outlier detection is the most time-consuming step. This usually involves a RANSAC procedure using the reprojection-error method for hypothesis evaluation. Realignment-based hypothesis evaluation method is observed to be more accurate, but the considerably slower speed makes it unsuitable for robots with limited resources. We use sufficient statistics of least-squares minimisation to speed up this process. The additive nature of these sufficient statistics makes it possible to compute pose estimates in each evaluation by reusing previously computed statistics. Thus estimates need not be calculated from scratch each time. The proposed method is tested on standard RANSAC, Preemptive RANSAC and R-RANSAC using benchmark datasets. The results show that the use of sufficient statistics speeds up the outlier detection process with realignment hypothesis evaluation for all RANSAC variants, achieving an execution speed of up to 6.72 times.
A novel depth edge prioritization based coding technique to boost-UP HEVC performance
- Authors: Podder, Pallab , Paul, Manoranjan , Murshed, Manzur
- Date: 2016
- Type: Text , Conference paper
- Relation: 2016 IEEE International Conference on Multimedia & Expo Workshops (ICMEW)
- Full Text: false
- Reviewed:
- Description: In addition to the texture, multiview video employs the utilization of depth coding for the reconstruction of 3D video and Free viewpoint video. Standing on some texture-depth correlations, a number of methods in literature reuses texture motion vector for the corresponding depth coding to reduce encoding time by avoiding costly motion estimation process. However, texture similarity metric is not always equivalent to the corresponding depth similarity metric especially at edge levels. Since their approaches could not explicitly detect and encode acute edge motions of depth objects, eventually, could not reach the similar or improved rate-distortion (RD) performance against the High Efficiency Video Coding (HEVC) reference test model (HM). With a view to more accurate motion detection and modeling, the proposed technique exploits an extra Pattern Mode comprising a group of pattern templates (GPTs) with different rectangular and non-rectangular object shapes and edges compared to the existing HEVC block partitioning modes. Moreover, the proposed Pattern Mode only encodes the motion areas and skips the background areas. The experimental results show that the proposed technique could save 30% encoding time and improve average 0.1dB Bjontegard Delta peak signal-to-noise ratio (BD-PSNR) compared to the HM.
Search and tracking algorithms for swarms of robots: A survey
- Authors: Senanayake, Madhubhashi , Senthooran, Ilankaikaone , Barca, Jan , Chung, Hoam , Kamruzzaman, Joarder , Murshed, Manzur
- Date: 2016
- Type: Text , Journal article
- Relation: Robotics and Autonomous Systems Vol. 75, no. Part B (2016), p. 422-434
- Full Text: false
- Reviewed:
- Description: Target search and tracking is a classical but difficult problem in many research domains, including computer vision, wireless sensor networks and robotics. We review the seminal works that addressed this problem in the area of swarm robotics, which is the application of swarm intelligence principles to the control of multi-robot systems. Robustness, scalability and flexibility, as well as distributed sensing, make swarm robotic systems well suited for the problem of target search and tracking in real-world applications. We classify the works we review according to the variations and aspects of the search and tracking problems they addressed. As this is a particularly application-driven research area, the adopted taxonomy makes this review serve as a quick reference guide to our readers in identifying related works and approaches according to their problem at hand. By no means is this an exhaustive review, but an overview for researchers who are new to the swarm robotics field, to help them easily start off their research. © 2015 Elsevier B.V.
Foreground motion and spatial saliency-based efficient HEVC Video Coding
- Authors: Podder, Pallab , Paul, Manoranjan , Murshed, Manzur
- Date: 2015
- Type: Text , Conference paper
- Relation: 2015 International Conference on Image and Vision Computing New Zealand (IVCNZ)
- Full Text: false
- Reviewed:
- Description: High Efficiency Video Coding (HEVC) could not provide real time facilities to the limited processing and battery powered electronic devices as its encoding time complexity increases multiple times compared to its predecessor. Numerous researchers contribute to address this limitation by reducing a number of motion estimation (ME) modes where they analyze homogeneity, residual and statistical correlation among different modes. Although their approaches save some encoding time, however, could not reach the similar rate-distortion (RD) performance with HEVC encoder as they merely depend on existing Lagrangian cost function (LCF) within HEVC framework. To overcome this limitation, in this paper, we capture visual attentive Foreground motion and salient region (FMSR) which are sensitive to human visual system for quality assessment. The FMSR features captured by visual attentive and dynamic background modeling are adaptively synthesized to determine a subset of candidate modes. This preprocessing phase is independent from LCF. Since the proposed technique can avoid exhaustive exploration of all modes with simple criteria, it can reduce 27% encoding time on average. With efficient selection of FMSR-based appropriate block partitioning modes, it can also improve up to 1.0dB peak signal-to-noise ratio (PSNR).
Exploiting spatial smoothness to recover undecoded coefficients for transform domain distributed video coding
- Authors: Ali, Mortuza , Murshed, Manzur
- Date: 2013
- Type: Text , Conference paper
- Relation: IEEE International Conference on Image Processing; Melbourne, Australia; 15th-18th September 2013, p. 1782-1786
- Relation: http://purl.org/au-research/grants/arc/DP1095487
- Full Text: false
- Reviewed:
- Description: In a transform domain distributed video coding scheme, the correlation between the current encoding unit, e.g. block and slice, and the corresponding side-information is modeled using a virtual channel. This correlation model is then used for rate allocation, quantization, and Wyner-Ziv coding. Since the encoder can only have an estimate of the correlation instead of the exact knowledge of the side-information, the decoder will fail to recover the quantized transformed coeffi- cients with a nonzero probability. In this paper, we propose to integrate a scheme at the decoder to recover the undecoded coefficients using the spatial smoothness property of individual video frames. Simulation results demonstrated that, at different decoding failure probabilities, a transformed coeffi- cient recovery scheme can significantly improve the quality of videos in terms of both PSNR and SSIM.
- Description: In a transform domain distributed video coding scheme, the correlation between the current encoding unit, e.g. block and slice, and the corresponding side-information is modeled using a virtual channel. This correlation model is then used for rate allocation, quantization, and Wyner-Ziv coding. Since the encoder can only have an estimate of the correlation instead of the exact knowledge of the side-information, the decoder will fail to recover the quantized transformed coeffi- cients with a nonzero probability. In this paper, we propose to integrate a scheme at the decoder to recover the undecoded coefficients using the spatial smoothness property of individual video frames. Simulation results demonstrated that, at different decoding failure probabilities, a transformed coeffi- cient recovery scheme can significantly improve the quality of videos in terms of both PSNR and SSIM
On temporal order invariance for view-invariant action recognition
- Authors: Ul-Haq, Anwaar , Gondal, Iqbal , Murshed, Manzur
- Date: 2013
- Type: Text , Journal article
- Relation: IEEE Transactions on Circuits and Systems for Video Technology Vol. 23, no. 2 (2013), p. 203-211
- Full Text: false
- Reviewed:
- Description: View-invariant action recognition is one of the most challenging problems in computer vision. Various representations are being devised for matching actions across different viewpoints to achieve view invariance. In this paper, we explore the invariance property of temporal order of action instances during action execution and utilize it for devising a new view-invariant action recognition approach. To ensure temporal order during matching, we utilize spatiotemporal features, feature fusion and temporal order consistency constraint. We start by extracting spatiotemporal cuboid features from video sequences and applying feature fusion to encapsulate within-class similarity for the same viewpoints. For each action class, we construct a feature fusion table to facilitate feature matching across different views. An action matching score is then calculated based on global temporal order constraint and number of matching features. Finally, the action label of the class with the maximum value of the matching score is assigned to the query action. Experimentation is performed on multiple view Inria Xmas motion acquisition sequences and West Virginia University action datasets, with encouraging results, that are comparable to the existing view-invariant action recognition techniques.
Perception-inspired background subtraction
- Authors: Haque, Mahfuzul , Murshed, Manzur
- Date: 2013
- Type: Text , Journal article
- Relation: IEEE Transactions on Circuits and Systems for Video Technology Vol. 23, no. 12 (2013 2013), p. 2127-2140
- Full Text: false
- Reviewed:
- Description: Developing universal and context-invariant methods is one of the hardest challenges in computer vision. Background subtraction (BS), an essential precursor in most machine vision applications used for foreground detection, is no exception. Due to overreliance on statistical observations, most BS techniques show unpredictable behavior in dynamic unconstrained scenarios in which the characteristics of the operating environment are either unknown or change drastically. To achieve superior foreground detection quality across unconstrained scenarios, we propose a new technique, called perception-inspired background subtraction (PBS), which avoids overreliance on statistical observations by making key modeling decisions based on the characteristics of human visual perception. PBS exploits the human perception-inspired confidence interval to associate an observed intensity value with another intensity value during both model learning and background-foreground classification. The concept of perception-inspired confidence interval is also used for identifying redundant samples, thus ensuring the optimal number of samples in the background model. Furthermore, PBS dynamically varies the model adaptation speed (learning rate) at pixel level based on observed scene dynamics to ensure faster adaptation of changed background regions, as well as longer retention of stationary foregrounds. Extensive experimental evaluations on a wide range of benchmark datasets validate the efficacy of PBS compared to the state of the art for unconstraint video analytics.
Predictive coding of integers with real-valued predictions
- Authors: Ali, Mortuza , Murshed, Manzur
- Date: 2013
- Type: Text , Conference paper
- Relation: DCC 2013 Data Compression Conference; Snowbird, USA; 20th-22nd March 2013; p. 431-440
- Relation: http://purl.org/au-research/grants/arc/DP130103670
- Full Text: false
- Reviewed:
- Description: In this paper, we have extended the Rice-Golomb code so that it can operate at fractional precision to efficiently exploit the real-valued predictions. Coding at infinitesimal precision allows the residuals to be modeled with the Lap lace distribution. Unlike the Rice-Golomb code, which maps equally probable opposite-signed residuals to different integers, the proposed coding scheme is symmetric in the sense that, at infinitesimal precision, it assigns code words of equal length to equally probable residual intervals. The symmetry of both the Lap lace distribution and the coding scheme facilitates the analysis of the proposed coding scheme to determine the average code-length and the optimal value of the associated coding parameter.
Efficient pattern index coding using syndrome coding and side information
- Authors: Paul, Manoranjan , Murshed, Manzur
- Date: 2012
- Type: Text , Journal article
- Relation: International Journal of Engineering and Industries Vol. 3, no. 3 (2012), p. 1-12
- Full Text: false
- Reviewed:
- Description: Pattern-based video coding focusing on moving regions has already established its superiority over the H.264 at very low bit rate. Up to a certain limit, the larger the number of pattern templates, thebetter the approximation to the moving regions. However, beyond that limit no coding gain is observed due to the need of an excessive number of pattern identification bits. Recently, distributed video codingschemes have used syndrome coding to predict the original information in the decoder using side information. In this paper a pattern identification scheme is proposed which predicts the pattern fromthe syndrome codes and side information in the decoder so that the actual pattern identification code is not needed. The experimental results confirm that the new scheme improves the rate-distortionperformance compared to the existing pattern-based video coding and compared with the H.264 standard. The proposed new scheme will also present opportunities for further syndrome codingapplication.
Novel local improvement techniques in clustered memetic algorithm for protein structure prediction
- Authors: Islam, Md Kamrul , Chetty, Madhu , Murshed, Manzur
- Date: 2011
- Type: Text , Conference paper
- Relation: IEEE Congress on Evolutionary Computation (IEEE CEC) p. 1003-1011
- Full Text: false
- Reviewed:
- Description: Evolutionary algorithms (EAs) often fail to find the global optimum due to genetic drift. As the protein structure prediction problem is multimodal having several global optima, EAs empowered with combined application of local and global search e.g., memetic algorithms, can be more effective. This paper introduces two novel local improvement techniques for the clustered memetic algorithm to incorporate both problem specific and search-space specific knowledge to find one of the optimum structures of a hydrophobic-polar protein sequence on lattice models. Experimental results show the superiority of the proposed techniques against existing EAs on benchmark sequences.
On dynamic scene geometry for view-invariant action matching
- Authors: Ul-Haq, Anwaar , Gondal, Iqbal , Murshed, Manzur
- Date: 2011
- Type: Text , Conference paper
- Relation: 24th IEEE Conference on Computer Vision and Pattern Recognition (CVPR) p. 3305-3312
- Full Text: false
- Reviewed:
- Description: Variation in viewpoints poses significant challenges to action recognition. One popular way of encoding view-invariant action representation is based on the exploitation of epipolar geometry between different views of the same action. Majority of representative work considers detection of landmark points and their tracking by assuming that motion trajectories for all landmark points on human body are available throughout the course of an action. Unfortunately, due to occlusion and noise, detection and tracking of these landmarks is not always robust. To facilitate it, some of the work assumes that such trajectories are manually marked which is a clear drawback and lacks automation introduced by computer vision. In this paper, we address this problem by proposing view invariant action matching score based on epipolar geometry between actor silhouettes, without tracking and explicit point correspondences. In addition, we explore multi-body epipolar constraint which facilitates to work on original action volumes without any pre-processing. We show that multi-body fundamental matrix captures the geometry of dynamic action scenes and helps devising an action matching score across different views without any prior segmentation of actors. Extensive experimentation on challenging view invariant action datasets shows that our approach not only removes long standing assumptions but also achieves significant improvement in recognition accuracy and retrieval.
A novel color image fusion QoS measure for multi-sensor night vision applications
- Authors: Anwaar, Ul-Haq , Gondal, Iqbal , Murshed, Manzur
- Date: 2010
- Type: Text , Conference proceedings
- Full Text: false
- Description: Color image fusion of visible and infra-red imagery can play an important role in multi-sensor night vision systems that are an integral part of modern warfare. Image fusion minimizes the amount of required bandwidth by transmitting the fused image rather than multiple sensor images. Color image fusion can be achieved by combining inputs from original colored sensors or by employing pseudo colorization and color transfer to grayscale images. Various quality measures have been proposed for multi-sensor grayscale image fusion techniques; but no appropriate quality measure has been devised for the quality evaluation of multi-sensor color image fusion. In this paper, we propose a novel color image fusion quality measure, Color Fusion Objective Index (CFOI) based on colorfulness, gradient similarity and mutual information techniques. Experimental results show the effectiveness of CFOI to evaluate the color and salient feature extraction introduced by color fusion techniques into the final fused imagery as well as its consistency with subjective evaluation.
Automated multi-sensor color video fusion for nighttime video surveillance
- Authors: Ul-Haq, Anwaar , Gondal, Iqbal , Murshed, Manzur
- Date: 2010
- Type: Text , Conference proceedings
- Full Text: false
- Description: In this paper, we present an automated color transfer based video fusion method to attain real-time color night vision capability for night-time video surveillance. We utilize simple RGB Color transfer technique to fused pseudo colored video frames without conversion to any uncorrelated color space. We investigated that final color fusion results greatly depend on the selection of target color Image. Therefore, rather than using any arbitrary target color image based on mere general visual anticipation, we have automated target color image selection using structural similarity and color saturation. We further apply color enhancement to improve final appearance of color fused images. Subjective and objective quality evaluations greatly indicate the effectiveness of our color video fusion method for nighttime video surveillance applications.
Feature weighting and retrieval methods for dynamic texture motion features
- Authors: Rahman, Ashfaqur , Murshed, Manzur
- Date: 2010
- Type: Text , Journal article
- Relation: International Journal of Computational Intelligence Systems Vol. 2, no. 1 (2010 2010), p. 27-38
- Full Text:
- Reviewed:
- Description: Feature weighing methods are commonly used to find the relative significance among a set of features that are effectively used by the retrieval methods to search image sequences efficiently from large databases. As evidenced in the current literature, dynamic textures (image sequences with regular motion patterns) can be effectively modelled by a set of spatial and temporal motion distribution features like motion co-occurrence matrix. The aim of this paper is to develop effective feature weighting and retrieval methods for a set of dynamic textures while characterized by motion co-occurrence matrices.
Motion compensation for block-based lossless video coding using lattice-based binning
- Authors: Ali, Mortuza , Murshed, Manzur
- Date: 2010
- Type: Text , Conference paper
- Full Text: false
- Reviewed:
- Description: Abstract— A block-based lossless video coding scheme using the notion of binning has been proposed in [1]. To further improve the compression and reduce the complexity, in this paper we investigate the impact of two sub-optimal motion search algorithms on the performance of this lattice-based scheme. While one of the algorithm tries avoiding motion vectors, the other tries to reduce complexity. Our experimental results have demonstrated that the loss due to sub-optimal motion search outweighs the gain when motion vectors are avoided. However, experimental results have shown that there is negligible performance loss when lowcomplexity sub-optimal three step search is used.
Scarf : Semi-automatic colorization and reliable image fusion
- Authors: Ul-Haq, Anwaar , Gondal, Iqbal , Murshed, Manzur
- Date: 2010
- Type: Text , Conference paper
- Relation: 2010 Digital Image Computing: Techniques and Applications p. 435-440
- Full Text: false
- Reviewed:
- Description: Nighttime imagery poses significant challenges to its enhancement due to loss of color information and limitation of single sensor to capture complete visual information at night. To cope with this challenge, multiple sensors are used to capture reliable nighttime imagery which presents additional demands for reliable visual information fusion. In this paper, we present a system, Scarf, which proposes reliable image fusion using advanced feature extraction techniques and a novel semi-automatic colorization based on optimization conformal to human visual system. Subjective and objective quality evaluation proves the effectiveness of proposed system.
Video coding focusing on block partitioning and occlusion
- Authors: Paul, Manoranjan , Murshed, Manzur
- Date: 2010
- Type: Text , Journal article
- Relation: IEEE Transactions on Image Processing Vol. 19, no. 3 (2010), p. 691-701
- Full Text: false
- Reviewed:
- Description: Among the existing block partitioning schemes, the pattern-based video coding (PVC) has already established its superiority at low bit-rate. Its innovative segmentation process with regular-shaped pattern templates is very fast as it avoids handling the exact shape of the moving objects. It also judiciously encodes the pattern-uncovered background segments capturing high level of interblock temporal redundancy without any motion compensation, which is favoured by the rate-distortion optimizer at low bit-rates. The existing PVC technique, however, uses a number of content-sensitive thresholds and thus setting them to any predefined values risks ignoring some of the macroblocks that would otherwise be encoded with patterns. Furthermore, occluded background can potentially degrade the performance of this technique. In this paper, a robust PVC scheme is proposed by removing all the content-sensitive thresholds, introducing a new similarity metric, considering multiple top-ranked patterns by the rate-distortion optimizer, and refining the Lagrangian multiplier of the H.264 standard for efficient embedding. A novel pattern-based residual encoding approach is also integrated to address the occlusion issue. Once embedded into the H.264 Baseline profile, the proposed PVC scheme improves the image quality perceptually significantly by at least 0.5 dB in low bit-rate video coding applications. A similar trend is observed for moderate to high bit-rate applications when the proposed scheme replaces the bi-directional predictive mode in the H.264 High profile.
A novel pattern identification scheme using distributed video coding concepts
- Authors: Paul, Manoranjan , Murshed, Manzur
- Date: 2009
- Type: Text , Conference paper
- Relation: 2009 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2009) p. 729-732
- Full Text: false
- Reviewed:
- Description: Pattern-based video coding focusing on moving region in a macroblock has already established its superiority over recent H.264 video coding standard at very low bit rate. Obviously, a large number of pattern templates approximate the moving regions better however, after a certain limit no coding gain is observed due to the increase number of pattern identification bits. Recently, distributed video coding schemes used syndrome coding to predict the original information in decoder using side information. In this paper a novel pattern identification scheme is proposed which predicts the pattern from the syndrome codes and side information in decoder so that actual pattern identification number is not needed in the bitstream. The experimental results confirm that this new scheme successfully improves the rate-distortion performance compared to the existing pattern-based video coding as well as H.264 standard. This new scheme will also open another window of syndrome coding application.