A coarse representation of frames oriented video coding by leveraging cuboidal partitioning of image data
- Ahmmed, Ashe, Paul, Manoranjan, Murshed, Manzur, Taubman, David
- Authors: Ahmmed, Ashe , Paul, Manoranjan , Murshed, Manzur , Taubman, David
- Date: 2020
- Type: Text , Conference paper
- Relation: 22nd IEEE International Workshop on Multimedia Signal Processing, MMSP 2020, Virtual Tampere, Finland 21-24 September 2020
- Full Text:
- Reviewed:
- Description: Video coding algorithms attempt to minimize the significant commonality that exists within a video sequence. Each new video coding standard contains tools that can perform this task more efficiently compared to its predecessors. In this work, we form a coarse representation of the current frame by minimizing commonality within that frame while preserving important structural properties of the frame. The building blocks of this coarse representation are rectangular regions called cuboids, which are computationally simple and has a compact description. Then we propose to employ the coarse frame as an additional source for predictive coding of the current frame. Experimental results show an improvement in bit rate savings over a reference codec for HEVC, with minor increase in the codec computational complexity. © 2020 IEEE.
- Authors: Ahmmed, Ashe , Paul, Manoranjan , Murshed, Manzur , Taubman, David
- Date: 2020
- Type: Text , Conference paper
- Relation: 22nd IEEE International Workshop on Multimedia Signal Processing, MMSP 2020, Virtual Tampere, Finland 21-24 September 2020
- Full Text:
- Reviewed:
- Description: Video coding algorithms attempt to minimize the significant commonality that exists within a video sequence. Each new video coding standard contains tools that can perform this task more efficiently compared to its predecessors. In this work, we form a coarse representation of the current frame by minimizing commonality within that frame while preserving important structural properties of the frame. The building blocks of this coarse representation are rectangular regions called cuboids, which are computationally simple and has a compact description. Then we propose to employ the coarse frame as an additional source for predictive coding of the current frame. Experimental results show an improvement in bit rate savings over a reference codec for HEVC, with minor increase in the codec computational complexity. © 2020 IEEE.
A novel no-reference subjective quality metric for free viewpoint video using human eye movement
- Podder, Pallab, Paul, Manoranjan, Murshed, Manzur
- Authors: Podder, Pallab , Paul, Manoranjan , Murshed, Manzur
- Date: 2018
- Type: Text , Conference proceedings
- Relation: 8th Pacific-Rim Symposium on Image and Video Technology, PSIVT 2017; Wuhan, China; 20th-24th November 2017; published in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) Vol. 10749 LNCS, p. 237-251
- Full Text:
- Reviewed:
- Description: The free viewpoint video (FVV) allows users to interactively control the viewpoint and generate new views of a dynamic scene from any 3D position for better 3D visual experience with depth perception. Multiview video coding exploits both texture and depth video information from various angles to encode a number of views to facilitate FVV. The usual practice for the single view or multiview quality assessment is characterized by evolving the objective quality assessment metrics due to their simplicity and real time applications such as the peak signal-to-noise ratio (PSNR) or the structural similarity index (SSIM). However, the PSNR or SSIM requires reference image for quality evaluation and could not be successfully employed in FVV as the new view in FVV does not have any reference view to compare with. Conversely, the widely used subjective estimator- mean opinion score (MOS) is often biased by the testing environment, viewers mode, domain knowledge, and many other factors that may actively influence on actual assessment. To address this limitation, in this work, we devise a no-reference subjective quality assessment metric by simply exploiting the pattern of human eye browsing on FVV. Over different quality contents of FVV, the participants eye-tracker recorded spatio-temporal gaze-data indicate more concentrated eye-traversing approach for relatively better quality. Thus, we calculate the Length, Angle, Pupil-size, and Gaze-duration features from the recorded gaze trajectory. The content and resolution invariant operation is carried out prior to synthesizing them using an adaptive weighted function to develop a new quality metric using eye traversal (QMET). Tested results reveal that the proposed QMET performs better than the SSIM and MOS in terms of assessing different aspects of coded video quality for a wide range of FVV contents.
- Description: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
- Authors: Podder, Pallab , Paul, Manoranjan , Murshed, Manzur
- Date: 2018
- Type: Text , Conference proceedings
- Relation: 8th Pacific-Rim Symposium on Image and Video Technology, PSIVT 2017; Wuhan, China; 20th-24th November 2017; published in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) Vol. 10749 LNCS, p. 237-251
- Full Text:
- Reviewed:
- Description: The free viewpoint video (FVV) allows users to interactively control the viewpoint and generate new views of a dynamic scene from any 3D position for better 3D visual experience with depth perception. Multiview video coding exploits both texture and depth video information from various angles to encode a number of views to facilitate FVV. The usual practice for the single view or multiview quality assessment is characterized by evolving the objective quality assessment metrics due to their simplicity and real time applications such as the peak signal-to-noise ratio (PSNR) or the structural similarity index (SSIM). However, the PSNR or SSIM requires reference image for quality evaluation and could not be successfully employed in FVV as the new view in FVV does not have any reference view to compare with. Conversely, the widely used subjective estimator- mean opinion score (MOS) is often biased by the testing environment, viewers mode, domain knowledge, and many other factors that may actively influence on actual assessment. To address this limitation, in this work, we devise a no-reference subjective quality assessment metric by simply exploiting the pattern of human eye browsing on FVV. Over different quality contents of FVV, the participants eye-tracker recorded spatio-temporal gaze-data indicate more concentrated eye-traversing approach for relatively better quality. Thus, we calculate the Length, Angle, Pupil-size, and Gaze-duration features from the recorded gaze trajectory. The content and resolution invariant operation is carried out prior to synthesizing them using an adaptive weighted function to develop a new quality metric using eye traversal (QMET). Tested results reveal that the proposed QMET performs better than the SSIM and MOS in terms of assessing different aspects of coded video quality for a wide range of FVV contents.
- Description: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
A novel quality metric using spatiotemporal correlational data of human eye maneuver
- Podder, Pallab, Paul, Manoranjan, Murshed, Manzur
- Authors: Podder, Pallab , Paul, Manoranjan , Murshed, Manzur
- Date: 2017
- Type: Text , Conference proceedings
- Relation: 2017 International Conference on Digital Image Computing : Techniques and Applications, DICTA 2017; Sydney, Australia; 29th November-1st December 2017 Vol. 2017-December, p. 1-8
- Full Text:
- Reviewed:
- Description: The popularly used subjective estimator- mean opinion score (MOS) is often biased by the testing environment, viewers mode, domain expertise, and many other factors that may actively influence on actual assessment. We therefore, devise a no- reference subjective quality assessment metric by exploiting the nature of human eye browsing on videos. The participants' eye-tracker recorded gaze-data indicate more concentrated eye- traversing approach for relatively better quality. We calculate the Length, Angle, Pupil-size, and Gaze-duration features from the recorded gaze trajectory. The content and resolution invariant operation is carried out prior to synthesizing them using an adaptive weighted function to develop a new quality metric using eye traversal (QMET). Tested results reveal that the quality evaluation carried out by QMET demonstrates a strong correlation with the most widely used peak signal-to-noise ratio (PSNR), structural similarity index (SSIM), and the MOS.
- Description: DICTA 2017 - 2017 International Conference on Digital Image Computing: Techniques and Applications
- Authors: Podder, Pallab , Paul, Manoranjan , Murshed, Manzur
- Date: 2017
- Type: Text , Conference proceedings
- Relation: 2017 International Conference on Digital Image Computing : Techniques and Applications, DICTA 2017; Sydney, Australia; 29th November-1st December 2017 Vol. 2017-December, p. 1-8
- Full Text:
- Reviewed:
- Description: The popularly used subjective estimator- mean opinion score (MOS) is often biased by the testing environment, viewers mode, domain expertise, and many other factors that may actively influence on actual assessment. We therefore, devise a no- reference subjective quality assessment metric by exploiting the nature of human eye browsing on videos. The participants' eye-tracker recorded gaze-data indicate more concentrated eye- traversing approach for relatively better quality. We calculate the Length, Angle, Pupil-size, and Gaze-duration features from the recorded gaze trajectory. The content and resolution invariant operation is carried out prior to synthesizing them using an adaptive weighted function to develop a new quality metric using eye traversal (QMET). Tested results reveal that the quality evaluation carried out by QMET demonstrates a strong correlation with the most widely used peak signal-to-noise ratio (PSNR), structural similarity index (SSIM), and the MOS.
- Description: DICTA 2017 - 2017 International Conference on Digital Image Computing: Techniques and Applications
A novel video coding scheme using a scene adaptive non-parametric background model
- Chakraborty, Subrata, Paul, Manoranjan, Murshed, Manzur, Ali, Mortuza
- Authors: Chakraborty, Subrata , Paul, Manoranjan , Murshed, Manzur , Ali, Mortuza
- Date: 2014
- Type: Text , Conference paper
- Relation: 16th IEEE International Workshop on Multimedia Signal Processing, MMSP 2014 p. 1-6
- Relation: http://purl.org/au-research/grants/arc/DP130103670
- Full Text:
- Reviewed:
- Description: Video coding techniques utilising background frames, provide better rate distortion performance by exploiting coding efficiency in uncovered background areas compared to the latest video coding standard. Parametric approaches such as the mixture of Gaussian (MoG) based background modeling has been widely used however they require prior knowledge about the test videos for parameter estimation. Recently introduced non-parametric (NP) based background modeling techniques successfully improved video coding performance through a HEVC integrated coding scheme. The inherent nature of the NP technique naturally exhibits superior performance in dynamic background scenarios compared to the MoG based technique without a priori knowledge of video data distribution. Although NP based coding schemes showed promising coding performances, they suffer from a number of key challenges - (a) determination of the optimal subset of training frames for generating a suitable background that can be used as a reference frame during coding, (b) incorporating dynamic changes in the background effectively after the initial background frame is generated, (c) managing frequent scene change leading to performance degradation, and (d) optimizing coding quality ratio between an I-frame and other frames under bit rate constraints. In this study we develop a new scene adaptive coding scheme using the NP based technique, capable of solving the current challenges by incorporating a new continuously updating background generation process. Extensive experimental results are also provided to validate the effectiveness of the new scheme.
- Authors: Chakraborty, Subrata , Paul, Manoranjan , Murshed, Manzur , Ali, Mortuza
- Date: 2014
- Type: Text , Conference paper
- Relation: 16th IEEE International Workshop on Multimedia Signal Processing, MMSP 2014 p. 1-6
- Relation: http://purl.org/au-research/grants/arc/DP130103670
- Full Text:
- Reviewed:
- Description: Video coding techniques utilising background frames, provide better rate distortion performance by exploiting coding efficiency in uncovered background areas compared to the latest video coding standard. Parametric approaches such as the mixture of Gaussian (MoG) based background modeling has been widely used however they require prior knowledge about the test videos for parameter estimation. Recently introduced non-parametric (NP) based background modeling techniques successfully improved video coding performance through a HEVC integrated coding scheme. The inherent nature of the NP technique naturally exhibits superior performance in dynamic background scenarios compared to the MoG based technique without a priori knowledge of video data distribution. Although NP based coding schemes showed promising coding performances, they suffer from a number of key challenges - (a) determination of the optimal subset of training frames for generating a suitable background that can be used as a reference frame during coding, (b) incorporating dynamic changes in the background effectively after the initial background frame is generated, (c) managing frequent scene change leading to performance degradation, and (d) optimizing coding quality ratio between an I-frame and other frames under bit rate constraints. In this study we develop a new scene adaptive coding scheme using the NP based technique, capable of solving the current challenges by incorporating a new continuously updating background generation process. Extensive experimental results are also provided to validate the effectiveness of the new scheme.
A robust forgery detection method for copy-move and splicing attacks in images
- Islam, Mohammad, Karmakar, Gour, Kamruzzaman, Joarder, Murshed, Manzur
- Authors: Islam, Mohammad , Karmakar, Gour , Kamruzzaman, Joarder , Murshed, Manzur
- Date: 2020
- Type: Text , Journal article
- Relation: Electronics Vol. 9, no. 9 (2020), p. 1-22
- Full Text:
- Reviewed:
- Description: Internet of Things (IoT) image sensors, social media, and smartphones generate huge volumes of digital images every day. Easy availability and usability of photo editing tools have made forgery attacks, primarily splicing and copy-move attacks, effortless, causing cybercrimes to be on the rise. While several models have been proposed in the literature for detecting these attacks, the robustness of those models has not been investigated when (i) a low number of tampered images are available for model building or (ii) images from IoT sensors are distorted due to image rotation or scaling caused by unwanted or unexpected changes in sensors' physical set-up. Moreover, further improvement in detection accuracy is needed for real-word security management systems. To address these limitations, in this paper, an innovative image forgery detection method has been proposed based on Discrete Cosine Transformation (DCT) and Local Binary Pattern (LBP) and a new feature extraction method using the mean operator. First, images are divided into non-overlapping fixed size blocks and 2D block DCT is applied to capture changes due to image forgery. Then LBP is applied to the magnitude of the DCT array to enhance forgery artifacts. Finally, the mean value of a particular cell across all LBP blocks is computed, which yields a fixed number of features and presents a more computationally efficient method. Using Support Vector Machine (SVM), the proposed method has been extensively tested on four well known publicly available gray scale and color image forgery datasets, and additionally on an IoT based image forgery dataset that we built. Experimental results reveal the superiority of our proposed method over recent state-of-the-art methods in terms of widely used performance metrics and computational time and demonstrate robustness against low availability of forged training samples.
- Description: This research was funded by Research Priority Area (RPA) scholarship of Federation University Australia.
- Authors: Islam, Mohammad , Karmakar, Gour , Kamruzzaman, Joarder , Murshed, Manzur
- Date: 2020
- Type: Text , Journal article
- Relation: Electronics Vol. 9, no. 9 (2020), p. 1-22
- Full Text:
- Reviewed:
- Description: Internet of Things (IoT) image sensors, social media, and smartphones generate huge volumes of digital images every day. Easy availability and usability of photo editing tools have made forgery attacks, primarily splicing and copy-move attacks, effortless, causing cybercrimes to be on the rise. While several models have been proposed in the literature for detecting these attacks, the robustness of those models has not been investigated when (i) a low number of tampered images are available for model building or (ii) images from IoT sensors are distorted due to image rotation or scaling caused by unwanted or unexpected changes in sensors' physical set-up. Moreover, further improvement in detection accuracy is needed for real-word security management systems. To address these limitations, in this paper, an innovative image forgery detection method has been proposed based on Discrete Cosine Transformation (DCT) and Local Binary Pattern (LBP) and a new feature extraction method using the mean operator. First, images are divided into non-overlapping fixed size blocks and 2D block DCT is applied to capture changes due to image forgery. Then LBP is applied to the magnitude of the DCT array to enhance forgery artifacts. Finally, the mean value of a particular cell across all LBP blocks is computed, which yields a fixed number of features and presents a more computationally efficient method. Using Support Vector Machine (SVM), the proposed method has been extensively tested on four well known publicly available gray scale and color image forgery datasets, and additionally on an IoT based image forgery dataset that we built. Experimental results reveal the superiority of our proposed method over recent state-of-the-art methods in terms of widely used performance metrics and computational time and demonstrate robustness against low availability of forged training samples.
- Description: This research was funded by Research Priority Area (RPA) scholarship of Federation University Australia.
Adaptive weighted non-parametric background model for efficient video coding
- Chakraborty, Subrata, Paul, Manoranjan, Murshed, Manzur, Ali, Mortuza
- Authors: Chakraborty, Subrata , Paul, Manoranjan , Murshed, Manzur , Ali, Mortuza
- Date: 2017
- Type: Text , Journal article
- Relation: Neurocomputing Vol. 226, no. (2017), p. 35-45
- Full Text:
- Reviewed:
- Description: Dynamic background frame based video coding using mixture of Gaussian (MoG) based background modelling has achieved better rate distortion performance compared to the H.264 standard. However, they suffer from high computation time, low coding efficiency for dynamic videos, and prior knowledge requirement of video content. In this paper, we introduce the application of the non-parametric (NP) background modelling approach for video coding domain. We present a novel background modelling technique, called weighted non-parametric (WNP) which balances the historical trend and the recent value of the pixel intensities adaptively based on the content and characteristics of any particular video. WNP is successfully embedded into the latest HEVC video coding standard for better rate-distortion performance. Moreover, a novel scene adaptive non-parametric (SANP) technique is also developed to handle video sequences with high dynamic background. Being non-parametric, the proposed techniques naturally exhibit superior performance in dynamic background modelling without a priori knowledge of video data distribution.
- Authors: Chakraborty, Subrata , Paul, Manoranjan , Murshed, Manzur , Ali, Mortuza
- Date: 2017
- Type: Text , Journal article
- Relation: Neurocomputing Vol. 226, no. (2017), p. 35-45
- Full Text:
- Reviewed:
- Description: Dynamic background frame based video coding using mixture of Gaussian (MoG) based background modelling has achieved better rate distortion performance compared to the H.264 standard. However, they suffer from high computation time, low coding efficiency for dynamic videos, and prior knowledge requirement of video content. In this paper, we introduce the application of the non-parametric (NP) background modelling approach for video coding domain. We present a novel background modelling technique, called weighted non-parametric (WNP) which balances the historical trend and the recent value of the pixel intensities adaptively based on the content and characteristics of any particular video. WNP is successfully embedded into the latest HEVC video coding standard for better rate-distortion performance. Moreover, a novel scene adaptive non-parametric (SANP) technique is also developed to handle video sequences with high dynamic background. Being non-parametric, the proposed techniques naturally exhibit superior performance in dynamic background modelling without a priori knowledge of video data distribution.
Advances in multimedia sensor networks for health-care and related applications
- Hossain, M. Shamim, Pathan, Al-Sakib, Goebel, Stefan, Rahman, Shawon, Murshed, Manzur
- Authors: Hossain, M. Shamim , Pathan, Al-Sakib , Goebel, Stefan , Rahman, Shawon , Murshed, Manzur
- Date: 2015
- Type: Text , Journal article , Editorial
- Relation: International Journal of Distributed Sensor Networks Vol. 2015, no. (2015), p. 1-2
- Full Text:
- Reviewed:
- Description: Multimedia sensor services and technologies play an important role in seamlessly providing andmanaging health, sports, and other services to anyone, everywhere, and anytime. Media sensors are usually equipped with cameras, microphones, and other devices that produce media content and services. Such services and technologies enable caregivers and related professionals to have immediate access to required information for efficient decision making. Since media sensing technology development is growing, many research opportunities are emerging in a broad spectrum of application domains.
- Authors: Hossain, M. Shamim , Pathan, Al-Sakib , Goebel, Stefan , Rahman, Shawon , Murshed, Manzur
- Date: 2015
- Type: Text , Journal article , Editorial
- Relation: International Journal of Distributed Sensor Networks Vol. 2015, no. (2015), p. 1-2
- Full Text:
- Reviewed:
- Description: Multimedia sensor services and technologies play an important role in seamlessly providing andmanaging health, sports, and other services to anyone, everywhere, and anytime. Media sensors are usually equipped with cameras, microphones, and other devices that produce media content and services. Such services and technologies enable caregivers and related professionals to have immediate access to required information for efficient decision making. Since media sensing technology development is growing, many research opportunities are emerging in a broad spectrum of application domains.
Adversarial network with multiple classifiers for open set domain adaptation
- Shermin, Tasfia, Lu, Guojun, Teng, Shyh, Murshed, Manzur, Sohel, Ferdous
- Authors: Shermin, Tasfia , Lu, Guojun , Teng, Shyh , Murshed, Manzur , Sohel, Ferdous
- Date: 2021
- Type: Text , Journal article
- Relation: IEEE Transactions on Multimedia Vol. 23, no. (2021), p. 2732-2744
- Full Text:
- Reviewed:
- Description: Domain adaptation aims to transfer knowledge from a domain with adequate labeled samples to a domain with scarce labeled samples. Prior research has introduced various open set domain adaptation settings in the literature to extend the applications of domain adaptation methods in real-world scenarios. This paper focuses on the type of open set domain adaptation setting where the target domain has both private ('unknown classes') label space and the shared ('known classes') label space. However, the source domain only has the 'known classes' label space. Prevalent distribution-matching domain adaptation methods are inadequate in such a setting that demands adaptation from a smaller source domain to a larger and diverse target domain with more classes. For addressing this specific open set domain adaptation setting, prior research introduces a domain adversarial model that uses a fixed threshold for distinguishing known from unknown target samples and lacks at handling negative transfers. We extend their adversarial model and propose a novel adversarial domain adaptation model with multiple auxiliary classifiers. The proposed multi-classifier structure introduces a weighting module that evaluates distinctive domain characteristics for assigning the target samples with weights which are more representative to whether they are likely to belong to the known and unknown classes to encourage positive transfers during adversarial training and simultaneously reduces the domain gap between the shared classes of the source and target domains. A thorough experimental investigation shows that our proposed method outperforms existing domain adaptation methods on a number of domain adaptation datasets. © 1999-2012 IEEE.
- Authors: Shermin, Tasfia , Lu, Guojun , Teng, Shyh , Murshed, Manzur , Sohel, Ferdous
- Date: 2021
- Type: Text , Journal article
- Relation: IEEE Transactions on Multimedia Vol. 23, no. (2021), p. 2732-2744
- Full Text:
- Reviewed:
- Description: Domain adaptation aims to transfer knowledge from a domain with adequate labeled samples to a domain with scarce labeled samples. Prior research has introduced various open set domain adaptation settings in the literature to extend the applications of domain adaptation methods in real-world scenarios. This paper focuses on the type of open set domain adaptation setting where the target domain has both private ('unknown classes') label space and the shared ('known classes') label space. However, the source domain only has the 'known classes' label space. Prevalent distribution-matching domain adaptation methods are inadequate in such a setting that demands adaptation from a smaller source domain to a larger and diverse target domain with more classes. For addressing this specific open set domain adaptation setting, prior research introduces a domain adversarial model that uses a fixed threshold for distinguishing known from unknown target samples and lacks at handling negative transfers. We extend their adversarial model and propose a novel adversarial domain adaptation model with multiple auxiliary classifiers. The proposed multi-classifier structure introduces a weighting module that evaluates distinctive domain characteristics for assigning the target samples with weights which are more representative to whether they are likely to belong to the known and unknown classes to encourage positive transfers during adversarial training and simultaneously reduces the domain gap between the shared classes of the source and target domains. A thorough experimental investigation shows that our proposed method outperforms existing domain adaptation methods on a number of domain adaptation datasets. © 1999-2012 IEEE.
An efficient RANSAC hypothesis evaluation using sufficient statistics for RGB-D pose estimation
- Senthooran, Ilankalkone, Murshed, Manzur, Barca, Jan, Kamruzzaman, Joarder, Chung, Hoam
- Authors: Senthooran, Ilankalkone , Murshed, Manzur , Barca, Jan , Kamruzzaman, Joarder , Chung, Hoam
- Date: 2019
- Type: Text , Journal article
- Relation: Autonomous Robots Vol. 43, no. 5 (2019), p. 1257-1270
- Full Text:
- Reviewed:
- Description: Achieving autonomous flight in GPS-denied environments begins with pose estimation in three-dimensional space, and this is much more challenging in an MAV in a swarm robotic system due to limited computational resources. In vision-based pose estimation, outlier detection is the most time-consuming step. This usually involves a RANSAC procedure using the reprojection-error method for hypothesis evaluation. Realignment-based hypothesis evaluation method is observed to be more accurate, but the considerably slower speed makes it unsuitable for robots with limited resources. We use sufficient statistics of least-squares minimisation to speed up this process. The additive nature of these sufficient statistics makes it possible to compute pose estimates in each evaluation by reusing previously computed statistics. Thus estimates need not be calculated from scratch each time. The proposed method is tested on standard RANSAC, Preemptive RANSAC and R-RANSAC using benchmark datasets. The results show that the use of sufficient statistics speeds up the outlier detection process with realignment hypothesis evaluation for all RANSAC variants, achieving an execution speed of up to 6.72 times.
- Authors: Senthooran, Ilankalkone , Murshed, Manzur , Barca, Jan , Kamruzzaman, Joarder , Chung, Hoam
- Date: 2019
- Type: Text , Journal article
- Relation: Autonomous Robots Vol. 43, no. 5 (2019), p. 1257-1270
- Full Text:
- Reviewed:
- Description: Achieving autonomous flight in GPS-denied environments begins with pose estimation in three-dimensional space, and this is much more challenging in an MAV in a swarm robotic system due to limited computational resources. In vision-based pose estimation, outlier detection is the most time-consuming step. This usually involves a RANSAC procedure using the reprojection-error method for hypothesis evaluation. Realignment-based hypothesis evaluation method is observed to be more accurate, but the considerably slower speed makes it unsuitable for robots with limited resources. We use sufficient statistics of least-squares minimisation to speed up this process. The additive nature of these sufficient statistics makes it possible to compute pose estimates in each evaluation by reusing previously computed statistics. Thus estimates need not be calculated from scratch each time. The proposed method is tested on standard RANSAC, Preemptive RANSAC and R-RANSAC using benchmark datasets. The results show that the use of sufficient statistics speeds up the outlier detection process with realignment hypothesis evaluation for all RANSAC variants, achieving an execution speed of up to 6.72 times.
An efficient video coding technique using a novel non-parametric background model
- Chakraborty, Subrata, Paul, Manoranjan, Murshed, Manzur, Ali, Mortuza
- Authors: Chakraborty, Subrata , Paul, Manoranjan , Murshed, Manzur , Ali, Mortuza
- Date: 2014
- Type: Text , Conference proceedings
- Relation: 2014 IEEE International Conference on Multimedia and Expo Workshops, ICMEW 2014; Chengdu; China; 14th-18th July 2014 p. 1-6
- Full Text:
- Reviewed:
- Description: Video coding technique with a background frame, extracted from mixture of Gaussian (MoG) based background modeling, provides better rate distortion performance by exploiting coding efficiency in uncovered background areas compared to the latest video coding standard. However, it suffers from high computation time, low coding efficiency for dynamic videos, and prior knowledge requirement of video content. In this paper, we present a novel adaptive weighted non-parametric (WNP) background modeling technique and successfully embed it into HEVC video coding standard. Being non-parametric (NP), the proposed technique naturally exhibits superior performance in dynamic background scenarios compared to MoG-based technique without a priori knowledge of video data distribution. In addition, the WNP technique significantly reduces noise-related drawbacks of existing NP techniques to provide better quality video coding with much lower computation time as demonstrated through extensive comparative studies against NP, MoG and HEVC techniques.
- Authors: Chakraborty, Subrata , Paul, Manoranjan , Murshed, Manzur , Ali, Mortuza
- Date: 2014
- Type: Text , Conference proceedings
- Relation: 2014 IEEE International Conference on Multimedia and Expo Workshops, ICMEW 2014; Chengdu; China; 14th-18th July 2014 p. 1-6
- Full Text:
- Reviewed:
- Description: Video coding technique with a background frame, extracted from mixture of Gaussian (MoG) based background modeling, provides better rate distortion performance by exploiting coding efficiency in uncovered background areas compared to the latest video coding standard. However, it suffers from high computation time, low coding efficiency for dynamic videos, and prior knowledge requirement of video content. In this paper, we present a novel adaptive weighted non-parametric (WNP) background modeling technique and successfully embed it into HEVC video coding standard. Being non-parametric (NP), the proposed technique naturally exhibits superior performance in dynamic background scenarios compared to MoG-based technique without a priori knowledge of video data distribution. In addition, the WNP technique significantly reduces noise-related drawbacks of existing NP techniques to provide better quality video coding with much lower computation time as demonstrated through extensive comparative studies against NP, MoG and HEVC techniques.
Bidirectional mapping coupled GAN for generalized zero-shot learning
- Shermin, Tasfia, Teng, Shyh, Sohel, Ferdous, Murshed, Manzur, Lu, Guojun
- Authors: Shermin, Tasfia , Teng, Shyh , Sohel, Ferdous , Murshed, Manzur , Lu, Guojun
- Date: 2022
- Type: Text , Journal article
- Relation: IEEE Transactions on Image Processing Vol. 31, no. (2022), p. 721-733
- Full Text:
- Reviewed:
- Description: Bidirectional mapping-based generalized zero-shot learning (GZSL) methods rely on the quality of synthesized features to recognize seen and unseen data. Therefore, learning a joint distribution of seen-unseen classes and preserving the distinction between seen-unseen classes is crucial for GZSL methods. However, existing methods only learn the underlying distribution of seen data, although unseen class semantics are available in the GZSL problem setting. Most methods neglect retaining seen-unseen classes distinction and use the learned distribution to recognize seen and unseen data. Consequently, they do not perform well. In this work, we utilize the available unseen class semantics alongside seen class semantics and learn joint distribution through a strong visual-semantic coupling. We propose a bidirectional mapping coupled generative adversarial network (BMCoGAN) by extending the concept of the coupled generative adversarial network into a bidirectional mapping model. We further integrate a Wasserstein generative adversarial optimization to supervise the joint distribution learning. We design a loss optimization for retaining distinctive information of seen-unseen classes in the synthesized features and reducing bias towards seen classes, which pushes synthesized seen features towards real seen features and pulls synthesized unseen features away from real seen features. We evaluate BMCoGAN on benchmark datasets and demonstrate its superior performance against contemporary methods. © 1992-2012 IEEE.
- Authors: Shermin, Tasfia , Teng, Shyh , Sohel, Ferdous , Murshed, Manzur , Lu, Guojun
- Date: 2022
- Type: Text , Journal article
- Relation: IEEE Transactions on Image Processing Vol. 31, no. (2022), p. 721-733
- Full Text:
- Reviewed:
- Description: Bidirectional mapping-based generalized zero-shot learning (GZSL) methods rely on the quality of synthesized features to recognize seen and unseen data. Therefore, learning a joint distribution of seen-unseen classes and preserving the distinction between seen-unseen classes is crucial for GZSL methods. However, existing methods only learn the underlying distribution of seen data, although unseen class semantics are available in the GZSL problem setting. Most methods neglect retaining seen-unseen classes distinction and use the learned distribution to recognize seen and unseen data. Consequently, they do not perform well. In this work, we utilize the available unseen class semantics alongside seen class semantics and learn joint distribution through a strong visual-semantic coupling. We propose a bidirectional mapping coupled generative adversarial network (BMCoGAN) by extending the concept of the coupled generative adversarial network into a bidirectional mapping model. We further integrate a Wasserstein generative adversarial optimization to supervise the joint distribution learning. We design a loss optimization for retaining distinctive information of seen-unseen classes in the synthesized features and reducing bias towards seen classes, which pushes synthesized seen features towards real seen features and pulls synthesized unseen features away from real seen features. We evaluate BMCoGAN on benchmark datasets and demonstrate its superior performance against contemporary methods. © 1992-2012 IEEE.
Comparative analysis of machine and deep learning models for soil properties prediction from hyperspectral visual band
- Datta, Dristi, Paul, Manoranjan, Murshed, Manzur, Teng, Shyh Wei, Schmidtke, Leigh
- Authors: Datta, Dristi , Paul, Manoranjan , Murshed, Manzur , Teng, Shyh Wei , Schmidtke, Leigh
- Date: 2023
- Type: Text , Journal article
- Relation: Environments Vol. 10, no. 5 (2023), p. 77
- Full Text:
- Reviewed:
- Description: Estimating various properties of soil, including moisture, carbon, and nitrogen, is crucial for studying their correlation with plant health and food production. However, conventional methods such as oven-drying and chemical analysis are laborious, expensive, and only feasible for a limited land area. With the advent of remote sensing technologies like multi/hyperspectral imaging, it is now possible to predict soil properties non-invasive and cost-effectively for a large expanse of bare land. Recent research shows the possibility of predicting those soil contents from a wide range of hyperspectral data using good prediction algorithms. However, these kinds of hyperspectral sensors are expensive and not widely available. Therefore, this paper investigates different machine and deep learning techniques to predict soil nutrient properties using only the red (R), green (G), and blue (B) bands data to propose a suitable machine/deep learning model that can be used as a rapid soil test. Another objective of this research is to observe and compare the prediction accuracy in three cases i. hyperspectral band ii. full spectrum of the visual band, and iii. three-channel of RGB band and provide a guideline to the user on which spectrum information they should use to predict those soil properties. The outcome of this research helps to develop a mobile application that is easy to use for a quick soil test. This research also explores learning-based algorithms with significant feature combinations and their performance comparisons in predicting soil properties from visual band data. For this, we also explore the impact of dimensional reduction (i.e., principal component analysis) and transformations (i.e., empirical mode decomposition) of features. The results show that the proposed model can comparably predict the soil contents from the three-channel RGB data.
- Authors: Datta, Dristi , Paul, Manoranjan , Murshed, Manzur , Teng, Shyh Wei , Schmidtke, Leigh
- Date: 2023
- Type: Text , Journal article
- Relation: Environments Vol. 10, no. 5 (2023), p. 77
- Full Text:
- Reviewed:
- Description: Estimating various properties of soil, including moisture, carbon, and nitrogen, is crucial for studying their correlation with plant health and food production. However, conventional methods such as oven-drying and chemical analysis are laborious, expensive, and only feasible for a limited land area. With the advent of remote sensing technologies like multi/hyperspectral imaging, it is now possible to predict soil properties non-invasive and cost-effectively for a large expanse of bare land. Recent research shows the possibility of predicting those soil contents from a wide range of hyperspectral data using good prediction algorithms. However, these kinds of hyperspectral sensors are expensive and not widely available. Therefore, this paper investigates different machine and deep learning techniques to predict soil nutrient properties using only the red (R), green (G), and blue (B) bands data to propose a suitable machine/deep learning model that can be used as a rapid soil test. Another objective of this research is to observe and compare the prediction accuracy in three cases i. hyperspectral band ii. full spectrum of the visual band, and iii. three-channel of RGB band and provide a guideline to the user on which spectrum information they should use to predict those soil properties. The outcome of this research helps to develop a mobile application that is easy to use for a quick soil test. This research also explores learning-based algorithms with significant feature combinations and their performance comparisons in predicting soil properties from visual band data. For this, we also explore the impact of dimensional reduction (i.e., principal component analysis) and transformations (i.e., empirical mode decomposition) of features. The results show that the proposed model can comparably predict the soil contents from the three-channel RGB data.
Cuboid colour image segmentation using intuitive distance measure
- Tania, Sheikh, Murshed, Manzur, Teng, Shyh, Karmakar, Gour
- Authors: Tania, Sheikh , Murshed, Manzur , Teng, Shyh , Karmakar, Gour
- Date: 2018
- Type: Text , Conference proceedings
- Relation: 2018 International Conference on Image and Vision Computing New Zealand, IVCNZ 2018; Auckland, New Zealand; 19th-21st November 2018 Vol. 2018-November, p. 1-6
- Full Text:
- Reviewed:
- Description: In this paper, an improved algorithm for cuboid image segmentation is proposed. To address the two main limitations of the recently proposed cuboid segmentation algorithm, the improved algorithm substitutes colour quantization in HCL colour space with infinity norm distance in RGB colour space along with a different way to impose area thresholding. We also propose a new metric to evaluate the quality of segmentation. Experimental results show that the proposed cuboid segmentation algorithm significantly outperforms the existing cuboid segmentation algorithm in terms of quality of segmentation.
- Description: International Conference Image and Vision Computing New Zealand
- Authors: Tania, Sheikh , Murshed, Manzur , Teng, Shyh , Karmakar, Gour
- Date: 2018
- Type: Text , Conference proceedings
- Relation: 2018 International Conference on Image and Vision Computing New Zealand, IVCNZ 2018; Auckland, New Zealand; 19th-21st November 2018 Vol. 2018-November, p. 1-6
- Full Text:
- Reviewed:
- Description: In this paper, an improved algorithm for cuboid image segmentation is proposed. To address the two main limitations of the recently proposed cuboid segmentation algorithm, the improved algorithm substitutes colour quantization in HCL colour space with infinity norm distance in RGB colour space along with a different way to impose area thresholding. We also propose a new metric to evaluate the quality of segmentation. Experimental results show that the proposed cuboid segmentation algorithm significantly outperforms the existing cuboid segmentation algorithm in terms of quality of segmentation.
- Description: International Conference Image and Vision Computing New Zealand
Demand-driven movement strategy for moving beacons in distributed sensor localization
- Authors: Murshed, Manzur
- Date: 2011
- Type: Text , Conference paper
- Relation: International Conference on Computational Science (ICCS)
- Full Text:
- Reviewed:
- Description: n a wireless sensor network, range-free localization with a moving beacon can reduce susceptibility to communication noises while concomitantly eliminate need for large number of expensive anchor nodes that are vulnerable to malicious attacks. This paper presents a moving beacon aided range-free localization technique, which is capable of estimating the location of a sensor with high accuracy. A novel distributed localization scheme is designed to optimally determine beacon movement strategy according to user demand. Superiority of this scheme to the state-of-the-art has been established in terms of location estimation quality, measured by the theoretical expected maximum error and simulated mean error while optimizing the beacon location density or traversal path length.
- Authors: Murshed, Manzur
- Date: 2011
- Type: Text , Conference paper
- Relation: International Conference on Computational Science (ICCS)
- Full Text:
- Reviewed:
- Description: n a wireless sensor network, range-free localization with a moving beacon can reduce susceptibility to communication noises while concomitantly eliminate need for large number of expensive anchor nodes that are vulnerable to malicious attacks. This paper presents a moving beacon aided range-free localization technique, which is capable of estimating the location of a sensor with high accuracy. A novel distributed localization scheme is designed to optimally determine beacon movement strategy according to user demand. Superiority of this scheme to the state-of-the-art has been established in terms of location estimation quality, measured by the theoretical expected maximum error and simulated mean error while optimizing the beacon location density or traversal path length.
Depth sequence coding with hierarchical partitioning and spatial-domain quantization
- Shahriyar, Shampa, Murshed, Manzur, Ali, Mortuza, Paul, Manoranjan
- Authors: Shahriyar, Shampa , Murshed, Manzur , Ali, Mortuza , Paul, Manoranjan
- Date: 2020
- Type: Text , Journal article
- Relation: IEEE Transactions on Circuits and Systems for Video Technology Vol. 30, no. 3 (2020), p. 835-849
- Full Text:
- Reviewed:
- Description: Depth coding in 3D-HEVC deforms object shapes due to block-level edge-approximation and lacks efficient techniques to exploit the statistical redundancy, due to the frame-level clustering tendency in depth data, for higher coding gain at near-lossless quality. This paper presents a standalone mono-view depth sequence coder, which preserves edges implicitly by limiting quantization to the spatial-domain and exploits the frame-level clustering tendency efficiently with a novel binary tree-based decomposition (BTBD) technique. The BTBD can exploit the statistical redundancy in frame-level syntax, motion components, and residuals efficiently with fewer block-level prediction/coding modes and simpler context modeling for context-adaptive arithmetic coding. Compared with the depth coder in 3D-HEVC, the proposed one has achieved significantly lower bitrate at lossless to near-lossless quality range for mono-view coding and rendered superior quality synthetic views from the depth maps, compressed at the same bitrate, and the corresponding texture frames. © 1991-2012 IEEE.
- Authors: Shahriyar, Shampa , Murshed, Manzur , Ali, Mortuza , Paul, Manoranjan
- Date: 2020
- Type: Text , Journal article
- Relation: IEEE Transactions on Circuits and Systems for Video Technology Vol. 30, no. 3 (2020), p. 835-849
- Full Text:
- Reviewed:
- Description: Depth coding in 3D-HEVC deforms object shapes due to block-level edge-approximation and lacks efficient techniques to exploit the statistical redundancy, due to the frame-level clustering tendency in depth data, for higher coding gain at near-lossless quality. This paper presents a standalone mono-view depth sequence coder, which preserves edges implicitly by limiting quantization to the spatial-domain and exploits the frame-level clustering tendency efficiently with a novel binary tree-based decomposition (BTBD) technique. The BTBD can exploit the statistical redundancy in frame-level syntax, motion components, and residuals efficiently with fewer block-level prediction/coding modes and simpler context modeling for context-adaptive arithmetic coding. Compared with the depth coder in 3D-HEVC, the proposed one has achieved significantly lower bitrate at lossless to near-lossless quality range for mono-view coding and rendered superior quality synthetic views from the depth maps, compressed at the same bitrate, and the corresponding texture frames. © 1991-2012 IEEE.
Depth-based sampling and steering constraints for memoryless local planners
- Nguyen, Binh, Nguyen, Linh, Choudhury, Tanveer, Keogh, Kathleen, Murshed, Manzur
- Authors: Nguyen, Binh , Nguyen, Linh , Choudhury, Tanveer , Keogh, Kathleen , Murshed, Manzur
- Date: 2023
- Type: Text , Journal article
- Relation: Journal of Intelligent and Robotic Systems: Theory and Applications Vol. 109, no. 3 (2023), p.
- Full Text:
- Reviewed:
- Description: By utilizing only depth information, the paper introduces a novel two-stage planning approach that enhances computational efficiency and planning performances for memoryless local planners. First, a depth-based sampling technique is proposed to identify and eliminate a specific type of in-collision trajectories among sampled candidates. Specifically, all trajectories that have obscured endpoints are found through querying the depth values and will then be excluded from the sampled set, which can significantly reduce the computational workload required in collision checking. Subsequently, we apply a tailored local planning algorithm that employs a direction cost function and a depth-based steering mechanism to prevent the robot from being trapped in local minima. Our planning algorithm is theoretically proven to be complete in convex obstacle scenarios. To validate the effectiveness of our DEpth-based both Sampling and Steering (DESS) approaches, we conducted experiments in simulated environments where a quadrotor flew through cluttered regions with multiple various-sized obstacles. The experimental results show that DESS significantly reduces computation time in local planning compared to the uniform sampling method, resulting in the planned trajectory with a lower minimized cost. More importantly, our success rates for navigation to different destinations in testing scenarios are improved considerably compared to the fixed-yawing approach. © 2023, The Author(s).
- Authors: Nguyen, Binh , Nguyen, Linh , Choudhury, Tanveer , Keogh, Kathleen , Murshed, Manzur
- Date: 2023
- Type: Text , Journal article
- Relation: Journal of Intelligent and Robotic Systems: Theory and Applications Vol. 109, no. 3 (2023), p.
- Full Text:
- Reviewed:
- Description: By utilizing only depth information, the paper introduces a novel two-stage planning approach that enhances computational efficiency and planning performances for memoryless local planners. First, a depth-based sampling technique is proposed to identify and eliminate a specific type of in-collision trajectories among sampled candidates. Specifically, all trajectories that have obscured endpoints are found through querying the depth values and will then be excluded from the sampled set, which can significantly reduce the computational workload required in collision checking. Subsequently, we apply a tailored local planning algorithm that employs a direction cost function and a depth-based steering mechanism to prevent the robot from being trapped in local minima. Our planning algorithm is theoretically proven to be complete in convex obstacle scenarios. To validate the effectiveness of our DEpth-based both Sampling and Steering (DESS) approaches, we conducted experiments in simulated environments where a quadrotor flew through cluttered regions with multiple various-sized obstacles. The experimental results show that DESS significantly reduces computation time in local planning compared to the uniform sampling method, resulting in the planned trajectory with a lower minimized cost. More importantly, our success rates for navigation to different destinations in testing scenarios are improved considerably compared to the fixed-yawing approach. © 2023, The Author(s).
Detecting splicing and copy-move attacks in color images
- Islam, Mohammad, Karmakar, Gour, Kamruzzaman, Joarder, Murshed, Manzur, Kahandawa, Gayan, Parvin, Nahida
- Authors: Islam, Mohammad , Karmakar, Gour , Kamruzzaman, Joarder , Murshed, Manzur , Kahandawa, Gayan , Parvin, Nahida
- Date: 2018
- Type: Text , Conference proceedings
- Relation: 2018 International Conference on Digital Image Computing: Techniques and Applications, DICTA 2018; Canberra, Australia; 10th-13th December 2018 p. 1-7
- Full Text:
- Reviewed:
- Description: Image sensors are generating limitless digital images every day. Image forgery like splicing and copy-move are very common type of attacks that are easy to execute using sophisticated photo editing tools. As a result, digital forensics has attracted much attention to identify such tampering on digital images. In this paper, a passive (blind) image tampering identification method based on Discrete Cosine Transformation (DCT) and Local Binary Pattern (LBP) has been proposed. First, the chroma components of an image is divided into fixed sized non-overlapping blocks and 2D block DCT is applied to identify the changes due to forgery in local frequency distribution of the image. Then a texture descriptor, LBP is applied on the magnitude component of the 2D-DCT array to enhance the artifacts introduced by the tampering operation. The resulting LBP image is again divided into non-overlapping blocks. Finally, summations of corresponding inter-cell values of all the LBP blocks are computed and arranged as a feature vector. These features are fed into a Support Vector Machine (SVM) with Radial Basis Function (RBF) as kernel to distinguish forged images from authentic ones. The proposed method has been experimented extensively on three publicly available well-known image splicing and copy-move detection benchmark datasets of color images. Results demonstrate the superiority of the proposed method over recently proposed state-of-the-art approaches in terms of well accepted performance metrics such as accuracy, area under ROC curve and others.
- Description: 2018 International Conference on Digital Image Computing: Techniques and Applications, DICTA 2018
- Authors: Islam, Mohammad , Karmakar, Gour , Kamruzzaman, Joarder , Murshed, Manzur , Kahandawa, Gayan , Parvin, Nahida
- Date: 2018
- Type: Text , Conference proceedings
- Relation: 2018 International Conference on Digital Image Computing: Techniques and Applications, DICTA 2018; Canberra, Australia; 10th-13th December 2018 p. 1-7
- Full Text:
- Reviewed:
- Description: Image sensors are generating limitless digital images every day. Image forgery like splicing and copy-move are very common type of attacks that are easy to execute using sophisticated photo editing tools. As a result, digital forensics has attracted much attention to identify such tampering on digital images. In this paper, a passive (blind) image tampering identification method based on Discrete Cosine Transformation (DCT) and Local Binary Pattern (LBP) has been proposed. First, the chroma components of an image is divided into fixed sized non-overlapping blocks and 2D block DCT is applied to identify the changes due to forgery in local frequency distribution of the image. Then a texture descriptor, LBP is applied on the magnitude component of the 2D-DCT array to enhance the artifacts introduced by the tampering operation. The resulting LBP image is again divided into non-overlapping blocks. Finally, summations of corresponding inter-cell values of all the LBP blocks are computed and arranged as a feature vector. These features are fed into a Support Vector Machine (SVM) with Radial Basis Function (RBF) as kernel to distinguish forged images from authentic ones. The proposed method has been experimented extensively on three publicly available well-known image splicing and copy-move detection benchmark datasets of color images. Results demonstrate the superiority of the proposed method over recently proposed state-of-the-art approaches in terms of well accepted performance metrics such as accuracy, area under ROC curve and others.
- Description: 2018 International Conference on Digital Image Computing: Techniques and Applications, DICTA 2018
Efficient HEVC scheme using motion type categorization
- Podder, Pallab, Paul, Manoranjan, Murshed, Manzur
- Authors: Podder, Pallab , Paul, Manoranjan , Murshed, Manzur
- Date: 2014
- Type: Text , Conference proceedings
- Relation: 10th International Conference on emerging Networking EXperiments and Technologies (CoNEXT); Sydney, Australia; 2nd-5th December 2014; published in Proceedings of the 2014 Workshop on Design, Quality and Deployment of Adaptive Video Streaming p. 41-42
- Relation: http://purl.org/au-research/grants/arc/DP130103670
- Full Text:
- Reviewed:
- Description: High Efficiency Video Coding (HEVC) standard introduces a number of innovative tools which can reduce approximately 50% bit-rate compared to its predecessor H.264/AVC at the same perceptual video quality whereas the computational time has increased multiple times. To reduce the encoding time while preserving the expected video quality has become a real challenge today for video transmission and streaming especially using low-powered devices. Motion estimation (ME) and motion compensation (MC) using variable-size blocks (i.e., intermodes) require 60-80% of total computational time. In this paper we propose a new efficient intermode selection technique based on phase correlation and incorporate into HEVC framework to predict ME and MC modes and perform faster intermode selection based on three dissimilar motion types in different videos. Instead of exploring all the modes exhaustively we select a subset of modes using motion type and the final mode is selected based on the Lagrangian cost function. The experimental results show that compared to HEVC the average computational time can be downscaled by 34% while providing the similar rate-distortion (RD) performance.
- Authors: Podder, Pallab , Paul, Manoranjan , Murshed, Manzur
- Date: 2014
- Type: Text , Conference proceedings
- Relation: 10th International Conference on emerging Networking EXperiments and Technologies (CoNEXT); Sydney, Australia; 2nd-5th December 2014; published in Proceedings of the 2014 Workshop on Design, Quality and Deployment of Adaptive Video Streaming p. 41-42
- Relation: http://purl.org/au-research/grants/arc/DP130103670
- Full Text:
- Reviewed:
- Description: High Efficiency Video Coding (HEVC) standard introduces a number of innovative tools which can reduce approximately 50% bit-rate compared to its predecessor H.264/AVC at the same perceptual video quality whereas the computational time has increased multiple times. To reduce the encoding time while preserving the expected video quality has become a real challenge today for video transmission and streaming especially using low-powered devices. Motion estimation (ME) and motion compensation (MC) using variable-size blocks (i.e., intermodes) require 60-80% of total computational time. In this paper we propose a new efficient intermode selection technique based on phase correlation and incorporate into HEVC framework to predict ME and MC modes and perform faster intermode selection based on three dissimilar motion types in different videos. Instead of exploring all the modes exhaustively we select a subset of modes using motion type and the final mode is selected based on the Lagrangian cost function. The experimental results show that compared to HEVC the average computational time can be downscaled by 34% while providing the similar rate-distortion (RD) performance.
Efficient high-resolution video compression scheme using background and foreground layers
- Afsana, Fariha, Paul, Manoranjan, Murshed, Manzur, Taubman, David
- Authors: Afsana, Fariha , Paul, Manoranjan , Murshed, Manzur , Taubman, David
- Date: 2021
- Type: Text , Journal article
- Relation: IEEE Access Vol. 9, no. (2021), p. 157411-157421
- Full Text:
- Reviewed:
- Description: Video coding using dynamic background frame achieves better compression compared to the traditional techniques by encoding background and foreground separately. This process reduces coding bits for the overall frame significantly; however, encoding background still requires many bits that can be compressed further for achieving better coding efficiency. The cuboid coding framework has been proven to be one of the most effective methods of image compression which exploits homogeneous pixel correlation within a frame and has better alignment with object boundary compared to traditional block-based coding. In a video sequence, the cuboid-based frame partitioning varies with the changes of the foreground. However, since the background remains static for a group of pictures, the cuboid coding exploits better spatial pixel homogeneity. In this work, the impact of cuboid coding on the background frame for high-resolution videos (Ultra-High-Definition (UHD) and 360-degree videos) is investigated using the multilayer framework of SHVC. After the cuboid partitioning, the method of coarse frame generation has been improved with a novel idea by keeping human-visual sensitive information. Unlike the traditional SHVC scheme, in the proposed method, cuboid coded background and the foreground are encoded in separate layers in an implicit manner. Simulation results show that the proposed video coding method achieves an average BD-Rate reduction of 26.69% and BD-PSNR gain of 1.51 dB against SHVC with significant encoding time reduction for both UHD and 360 videos. It also achieves an average of 13.88% BD-Rate reduction and 0.78 dB BD-PSNR gain compared to the existing relevant method proposed by X. Hoang Van. © 2013 IEEE.
- Authors: Afsana, Fariha , Paul, Manoranjan , Murshed, Manzur , Taubman, David
- Date: 2021
- Type: Text , Journal article
- Relation: IEEE Access Vol. 9, no. (2021), p. 157411-157421
- Full Text:
- Reviewed:
- Description: Video coding using dynamic background frame achieves better compression compared to the traditional techniques by encoding background and foreground separately. This process reduces coding bits for the overall frame significantly; however, encoding background still requires many bits that can be compressed further for achieving better coding efficiency. The cuboid coding framework has been proven to be one of the most effective methods of image compression which exploits homogeneous pixel correlation within a frame and has better alignment with object boundary compared to traditional block-based coding. In a video sequence, the cuboid-based frame partitioning varies with the changes of the foreground. However, since the background remains static for a group of pictures, the cuboid coding exploits better spatial pixel homogeneity. In this work, the impact of cuboid coding on the background frame for high-resolution videos (Ultra-High-Definition (UHD) and 360-degree videos) is investigated using the multilayer framework of SHVC. After the cuboid partitioning, the method of coarse frame generation has been improved with a novel idea by keeping human-visual sensitive information. Unlike the traditional SHVC scheme, in the proposed method, cuboid coded background and the foreground are encoded in separate layers in an implicit manner. Simulation results show that the proposed video coding method achieves an average BD-Rate reduction of 26.69% and BD-PSNR gain of 1.51 dB against SHVC with significant encoding time reduction for both UHD and 360 videos. It also achieves an average of 13.88% BD-Rate reduction and 0.78 dB BD-PSNR gain compared to the existing relevant method proposed by X. Hoang Van. © 2013 IEEE.
Efficient video coding using visual sensitive information for HEVC coding standard
- Podder, Pallab, Paul, Manoranjan, Murshed, Manzur
- Authors: Podder, Pallab , Paul, Manoranjan , Murshed, Manzur
- Date: 2018
- Type: Text , Journal article
- Relation: IEEE Access Vol. 6, no. (2018), p. 75695-75708
- Full Text:
- Reviewed:
- Description: The latest high efficiency video coding (HEVC) standard introduces a large number of inter-mode block partitioning modes. The HEVC reference test model (HM) uses partially exhaustive tree-structured mode selection, which still explores a large number of prediction unit (PU) modes for a coding unit (CU). This impacts on encoding time rise which deprives a number of electronic devices having limited processing resources to use various features of HEVC. By analyzing the homogeneity, residual, and different statistical correlation among modes, many researchers speed-up the encoding process through the number of PU mode reduction. However, these approaches could not demonstrate the similar rate-distortion (RD) performance with the HM due to their dependency on existing Lagrangian cost function (LCF) within the HEVC framework. In this paper, to avoid the complete dependency on LCF in the initial phase, we exploit visual sensitive foreground motion and spatial salient metric (FMSSM) in a block. To capture its motion and saliency features, we use the dynamic background and visual saliency modeling, respectively. According to the FMSSM values, a subset of PU modes is then explored for encoding the CU. This preprocessing phase is independent from the existing LCF. As the proposed coding technique further reduces the number of PU modes using two simple criteria (i.e., motion and saliency), it outperforms the HM in terms of encoding time reduction. As it also encodes the uncovered and static background areas using the dynamic background frame as a substituted reference frame, it does not sacrifice quality. Tested results reveal that the proposed method achieves 32% average encoding time reduction of the HM without any quality loss for a wide range of videos.
- Authors: Podder, Pallab , Paul, Manoranjan , Murshed, Manzur
- Date: 2018
- Type: Text , Journal article
- Relation: IEEE Access Vol. 6, no. (2018), p. 75695-75708
- Full Text:
- Reviewed:
- Description: The latest high efficiency video coding (HEVC) standard introduces a large number of inter-mode block partitioning modes. The HEVC reference test model (HM) uses partially exhaustive tree-structured mode selection, which still explores a large number of prediction unit (PU) modes for a coding unit (CU). This impacts on encoding time rise which deprives a number of electronic devices having limited processing resources to use various features of HEVC. By analyzing the homogeneity, residual, and different statistical correlation among modes, many researchers speed-up the encoding process through the number of PU mode reduction. However, these approaches could not demonstrate the similar rate-distortion (RD) performance with the HM due to their dependency on existing Lagrangian cost function (LCF) within the HEVC framework. In this paper, to avoid the complete dependency on LCF in the initial phase, we exploit visual sensitive foreground motion and spatial salient metric (FMSSM) in a block. To capture its motion and saliency features, we use the dynamic background and visual saliency modeling, respectively. According to the FMSSM values, a subset of PU modes is then explored for encoding the CU. This preprocessing phase is independent from the existing LCF. As the proposed coding technique further reduces the number of PU modes using two simple criteria (i.e., motion and saliency), it outperforms the HM in terms of encoding time reduction. As it also encodes the uncovered and static background areas using the dynamic background frame as a substituted reference frame, it does not sacrifice quality. Tested results reveal that the proposed method achieves 32% average encoding time reduction of the HM without any quality loss for a wide range of videos.