Comparative analysis of machine and deep learning models for soil properties prediction from hyperspectral visual band
- Datta, Dristi, Paul, Manoranjan, Murshed, Manzur, Teng, Shyh Wei, Schmidtke, Leigh
- Authors: Datta, Dristi , Paul, Manoranjan , Murshed, Manzur , Teng, Shyh Wei , Schmidtke, Leigh
- Date: 2023
- Type: Text , Journal article
- Relation: Environments Vol. 10, no. 5 (2023), p. 77
- Full Text:
- Reviewed:
- Description: Estimating various properties of soil, including moisture, carbon, and nitrogen, is crucial for studying their correlation with plant health and food production. However, conventional methods such as oven-drying and chemical analysis are laborious, expensive, and only feasible for a limited land area. With the advent of remote sensing technologies like multi/hyperspectral imaging, it is now possible to predict soil properties non-invasive and cost-effectively for a large expanse of bare land. Recent research shows the possibility of predicting those soil contents from a wide range of hyperspectral data using good prediction algorithms. However, these kinds of hyperspectral sensors are expensive and not widely available. Therefore, this paper investigates different machine and deep learning techniques to predict soil nutrient properties using only the red (R), green (G), and blue (B) bands data to propose a suitable machine/deep learning model that can be used as a rapid soil test. Another objective of this research is to observe and compare the prediction accuracy in three cases i. hyperspectral band ii. full spectrum of the visual band, and iii. three-channel of RGB band and provide a guideline to the user on which spectrum information they should use to predict those soil properties. The outcome of this research helps to develop a mobile application that is easy to use for a quick soil test. This research also explores learning-based algorithms with significant feature combinations and their performance comparisons in predicting soil properties from visual band data. For this, we also explore the impact of dimensional reduction (i.e., principal component analysis) and transformations (i.e., empirical mode decomposition) of features. The results show that the proposed model can comparably predict the soil contents from the three-channel RGB data.
- Authors: Datta, Dristi , Paul, Manoranjan , Murshed, Manzur , Teng, Shyh Wei , Schmidtke, Leigh
- Date: 2023
- Type: Text , Journal article
- Relation: Environments Vol. 10, no. 5 (2023), p. 77
- Full Text:
- Reviewed:
- Description: Estimating various properties of soil, including moisture, carbon, and nitrogen, is crucial for studying their correlation with plant health and food production. However, conventional methods such as oven-drying and chemical analysis are laborious, expensive, and only feasible for a limited land area. With the advent of remote sensing technologies like multi/hyperspectral imaging, it is now possible to predict soil properties non-invasive and cost-effectively for a large expanse of bare land. Recent research shows the possibility of predicting those soil contents from a wide range of hyperspectral data using good prediction algorithms. However, these kinds of hyperspectral sensors are expensive and not widely available. Therefore, this paper investigates different machine and deep learning techniques to predict soil nutrient properties using only the red (R), green (G), and blue (B) bands data to propose a suitable machine/deep learning model that can be used as a rapid soil test. Another objective of this research is to observe and compare the prediction accuracy in three cases i. hyperspectral band ii. full spectrum of the visual band, and iii. three-channel of RGB band and provide a guideline to the user on which spectrum information they should use to predict those soil properties. The outcome of this research helps to develop a mobile application that is easy to use for a quick soil test. This research also explores learning-based algorithms with significant feature combinations and their performance comparisons in predicting soil properties from visual band data. For this, we also explore the impact of dimensional reduction (i.e., principal component analysis) and transformations (i.e., empirical mode decomposition) of features. The results show that the proposed model can comparably predict the soil contents from the three-channel RGB data.
Determination of munsell soil colour using smartphones
- Nodi, Sadia, Paul, Manoranjan, Robinson, Nathan, Wang, Liang, Rehman, Sabih
- Authors: Nodi, Sadia , Paul, Manoranjan , Robinson, Nathan , Wang, Liang , Rehman, Sabih
- Date: 2023
- Type: Text , Journal article
- Relation: Sensors Vol. 23, no. 6 (2023), p.
- Full Text:
- Reviewed:
- Description: Soil colour is one of the most important factors in agriculture for monitoring soil health and determining its properties. For this purpose, Munsell soil colour charts are widely used by archaeologists, scientists, and farmers. The process of determining soil colour from the chart is subjective and error-prone. In this study, we used popular smartphones to capture soil colours from images in the Munsell Soil Colour Book (MSCB) to determine the colour digitally. These captured soil colours are then compared with the true colour determined using a commonly used sensor (Nix Pro-2). We have observed that there are colour reading discrepancies between smartphone and Nix Pro-provided readings. To address this issue, we investigated different colour models and finally introduced a colour-intensity relationship between the images captured by Nix Pro and smartphones by exploring different distance functions. Thus, the aim of this study is to determine the Munsell soil colour accurately from the MSCB by adjusting the pixel intensity of the smartphone-captured images. Without any adjustment when the accuracy of individual Munsell soil colour determination is only (Formula presented.) for the top 5 predictions, the accuracy of the proposed method is (Formula presented.), which is significant. © 2023 by the authors.
- Authors: Nodi, Sadia , Paul, Manoranjan , Robinson, Nathan , Wang, Liang , Rehman, Sabih
- Date: 2023
- Type: Text , Journal article
- Relation: Sensors Vol. 23, no. 6 (2023), p.
- Full Text:
- Reviewed:
- Description: Soil colour is one of the most important factors in agriculture for monitoring soil health and determining its properties. For this purpose, Munsell soil colour charts are widely used by archaeologists, scientists, and farmers. The process of determining soil colour from the chart is subjective and error-prone. In this study, we used popular smartphones to capture soil colours from images in the Munsell Soil Colour Book (MSCB) to determine the colour digitally. These captured soil colours are then compared with the true colour determined using a commonly used sensor (Nix Pro-2). We have observed that there are colour reading discrepancies between smartphone and Nix Pro-provided readings. To address this issue, we investigated different colour models and finally introduced a colour-intensity relationship between the images captured by Nix Pro and smartphones by exploring different distance functions. Thus, the aim of this study is to determine the Munsell soil colour accurately from the MSCB by adjusting the pixel intensity of the smartphone-captured images. Without any adjustment when the accuracy of individual Munsell soil colour determination is only (Formula presented.) for the top 5 predictions, the accuracy of the proposed method is (Formula presented.), which is significant. © 2023 by the authors.
Soil moisture, organic carbon, and nitrogen content prediction with hyperspectral data using regression models
- Datta, Dristi, Paul, Manoranjan, Murshed, Manzur, Teng, Shyh Wei, Schmidtke, Leigh
- Authors: Datta, Dristi , Paul, Manoranjan , Murshed, Manzur , Teng, Shyh Wei , Schmidtke, Leigh
- Date: 2022
- Type: Text , Journal article
- Relation: Sensors (Basel, Switzerland) Vol. 22, no. 20 (2022), p.
- Full Text:
- Reviewed:
- Description: Soil moisture, soil organic carbon, and nitrogen content prediction are considered significant fields of study as they are directly related to plant health and food production. Direct estimation of these soil properties with traditional methods, for example, the oven-drying technique and chemical analysis, is a time and resource-consuming approach and can predict only smaller areas. With the significant development of remote sensing and hyperspectral (HS) imaging technologies, soil moisture, carbon, and nitrogen can be estimated over vast areas. This paper presents a generalized approach to predicting three different essential soil contents using a comprehensive study of various machine learning (ML) models by considering the dimensional reduction in feature spaces. In this study, we have used three popular benchmark HS datasets captured in Germany and Sweden. The efficacy of different ML algorithms is evaluated to predict soil content, and significant improvement is obtained when a specific range of bands is selected. The performance of ML models is further improved by applying principal component analysis (PCA), a dimensional reduction method that works with an unsupervised learning method. The effect of soil temperature on soil moisture prediction is evaluated in this study, and the results show that when the soil temperature is considered with the HS band, the soil moisture prediction accuracy does not improve. However, the combined effect of band selection and feature transformation using PCA significantly enhances the prediction accuracy for soil moisture, carbon, and nitrogen content. This study represents a comprehensive analysis of a wide range of established ML regression models using data preprocessing, effective band selection, and data dimension reduction and attempt to understand which feature combinations provide the best accuracy. The outcomes of several ML models are verified with validation techniques and the best- and worst-case scenarios in terms of soil content are noted. The proposed approach outperforms existing estimation techniques.
- Authors: Datta, Dristi , Paul, Manoranjan , Murshed, Manzur , Teng, Shyh Wei , Schmidtke, Leigh
- Date: 2022
- Type: Text , Journal article
- Relation: Sensors (Basel, Switzerland) Vol. 22, no. 20 (2022), p.
- Full Text:
- Reviewed:
- Description: Soil moisture, soil organic carbon, and nitrogen content prediction are considered significant fields of study as they are directly related to plant health and food production. Direct estimation of these soil properties with traditional methods, for example, the oven-drying technique and chemical analysis, is a time and resource-consuming approach and can predict only smaller areas. With the significant development of remote sensing and hyperspectral (HS) imaging technologies, soil moisture, carbon, and nitrogen can be estimated over vast areas. This paper presents a generalized approach to predicting three different essential soil contents using a comprehensive study of various machine learning (ML) models by considering the dimensional reduction in feature spaces. In this study, we have used three popular benchmark HS datasets captured in Germany and Sweden. The efficacy of different ML algorithms is evaluated to predict soil content, and significant improvement is obtained when a specific range of bands is selected. The performance of ML models is further improved by applying principal component analysis (PCA), a dimensional reduction method that works with an unsupervised learning method. The effect of soil temperature on soil moisture prediction is evaluated in this study, and the results show that when the soil temperature is considered with the HS band, the soil moisture prediction accuracy does not improve. However, the combined effect of band selection and feature transformation using PCA significantly enhances the prediction accuracy for soil moisture, carbon, and nitrogen content. This study represents a comprehensive analysis of a wide range of established ML regression models using data preprocessing, effective band selection, and data dimension reduction and attempt to understand which feature combinations provide the best accuracy. The outcomes of several ML models are verified with validation techniques and the best- and worst-case scenarios in terms of soil content are noted. The proposed approach outperforms existing estimation techniques.
Efficient high-resolution video compression scheme using background and foreground layers
- Afsana, Fariha, Paul, Manoranjan, Murshed, Manzur, Taubman, David
- Authors: Afsana, Fariha , Paul, Manoranjan , Murshed, Manzur , Taubman, David
- Date: 2021
- Type: Text , Journal article
- Relation: IEEE Access Vol. 9, no. (2021), p. 157411-157421
- Full Text:
- Reviewed:
- Description: Video coding using dynamic background frame achieves better compression compared to the traditional techniques by encoding background and foreground separately. This process reduces coding bits for the overall frame significantly; however, encoding background still requires many bits that can be compressed further for achieving better coding efficiency. The cuboid coding framework has been proven to be one of the most effective methods of image compression which exploits homogeneous pixel correlation within a frame and has better alignment with object boundary compared to traditional block-based coding. In a video sequence, the cuboid-based frame partitioning varies with the changes of the foreground. However, since the background remains static for a group of pictures, the cuboid coding exploits better spatial pixel homogeneity. In this work, the impact of cuboid coding on the background frame for high-resolution videos (Ultra-High-Definition (UHD) and 360-degree videos) is investigated using the multilayer framework of SHVC. After the cuboid partitioning, the method of coarse frame generation has been improved with a novel idea by keeping human-visual sensitive information. Unlike the traditional SHVC scheme, in the proposed method, cuboid coded background and the foreground are encoded in separate layers in an implicit manner. Simulation results show that the proposed video coding method achieves an average BD-Rate reduction of 26.69% and BD-PSNR gain of 1.51 dB against SHVC with significant encoding time reduction for both UHD and 360 videos. It also achieves an average of 13.88% BD-Rate reduction and 0.78 dB BD-PSNR gain compared to the existing relevant method proposed by X. Hoang Van. © 2013 IEEE.
- Authors: Afsana, Fariha , Paul, Manoranjan , Murshed, Manzur , Taubman, David
- Date: 2021
- Type: Text , Journal article
- Relation: IEEE Access Vol. 9, no. (2021), p. 157411-157421
- Full Text:
- Reviewed:
- Description: Video coding using dynamic background frame achieves better compression compared to the traditional techniques by encoding background and foreground separately. This process reduces coding bits for the overall frame significantly; however, encoding background still requires many bits that can be compressed further for achieving better coding efficiency. The cuboid coding framework has been proven to be one of the most effective methods of image compression which exploits homogeneous pixel correlation within a frame and has better alignment with object boundary compared to traditional block-based coding. In a video sequence, the cuboid-based frame partitioning varies with the changes of the foreground. However, since the background remains static for a group of pictures, the cuboid coding exploits better spatial pixel homogeneity. In this work, the impact of cuboid coding on the background frame for high-resolution videos (Ultra-High-Definition (UHD) and 360-degree videos) is investigated using the multilayer framework of SHVC. After the cuboid partitioning, the method of coarse frame generation has been improved with a novel idea by keeping human-visual sensitive information. Unlike the traditional SHVC scheme, in the proposed method, cuboid coded background and the foreground are encoded in separate layers in an implicit manner. Simulation results show that the proposed video coding method achieves an average BD-Rate reduction of 26.69% and BD-PSNR gain of 1.51 dB against SHVC with significant encoding time reduction for both UHD and 360 videos. It also achieves an average of 13.88% BD-Rate reduction and 0.78 dB BD-PSNR gain compared to the existing relevant method proposed by X. Hoang Van. © 2013 IEEE.
Human-machine collaborative video coding through cuboidal partitioning
- Ahmmed, Ashek, Paul, Manoranjan, Murshed, Manzur, Taubman, David
- Authors: Ahmmed, Ashek , Paul, Manoranjan , Murshed, Manzur , Taubman, David
- Date: 2021
- Type: Text , Conference paper
- Relation: 2021 IEEE International Conference on Image Processing, ICIP 2021, Anchorage, USA 19-22 September 2021, Proceedings - International Conference on Image Processing, ICIP Vol. 2021-September, p. 2074-2078
- Full Text:
- Reviewed:
- Description: Video coding algorithms encode and decode an entire video frame while feature coding techniques only preserve and communicate the most critical information needed for a given application. This is because video coding targets human perception, while feature coding aims for machine vision tasks. Recently, attempts are being made to bridge the gap between these two domains. In this work, we propose a video coding framework by leveraging on to the commonality that exists between human vision and machine vision applications using cuboids. This is because cuboids, estimated rectangular regions over a video frame, are computationally efficient, has a compact representation and object centric. Such properties are already shown to add value to traditional video coding systems. Herein cuboidal feature descriptors are extracted from the current frame and then employed for accomplishing a machine vision task in the form of object detection. Experimental results show that a trained classifier yields superior average precision when equipped with cuboidal features oriented representation of the current test frame. Additionally, this representation costs 7% less in bit rate if the captured frames are need be communicated to a receiver. © 2021 IEEE.
- Authors: Ahmmed, Ashek , Paul, Manoranjan , Murshed, Manzur , Taubman, David
- Date: 2021
- Type: Text , Conference paper
- Relation: 2021 IEEE International Conference on Image Processing, ICIP 2021, Anchorage, USA 19-22 September 2021, Proceedings - International Conference on Image Processing, ICIP Vol. 2021-September, p. 2074-2078
- Full Text:
- Reviewed:
- Description: Video coding algorithms encode and decode an entire video frame while feature coding techniques only preserve and communicate the most critical information needed for a given application. This is because video coding targets human perception, while feature coding aims for machine vision tasks. Recently, attempts are being made to bridge the gap between these two domains. In this work, we propose a video coding framework by leveraging on to the commonality that exists between human vision and machine vision applications using cuboids. This is because cuboids, estimated rectangular regions over a video frame, are computationally efficient, has a compact representation and object centric. Such properties are already shown to add value to traditional video coding systems. Herein cuboidal feature descriptors are extracted from the current frame and then employed for accomplishing a machine vision task in the form of object detection. Experimental results show that a trained classifier yields superior average precision when equipped with cuboidal features oriented representation of the current test frame. Additionally, this representation costs 7% less in bit rate if the captured frames are need be communicated to a receiver. © 2021 IEEE.
A coarse representation of frames oriented video coding by leveraging cuboidal partitioning of image data
- Ahmmed, Ashe, Paul, Manoranjan, Murshed, Manzur, Taubman, David
- Authors: Ahmmed, Ashe , Paul, Manoranjan , Murshed, Manzur , Taubman, David
- Date: 2020
- Type: Text , Conference paper
- Relation: 22nd IEEE International Workshop on Multimedia Signal Processing, MMSP 2020, Virtual Tampere, Finland 21-24 September 2020
- Full Text:
- Reviewed:
- Description: Video coding algorithms attempt to minimize the significant commonality that exists within a video sequence. Each new video coding standard contains tools that can perform this task more efficiently compared to its predecessors. In this work, we form a coarse representation of the current frame by minimizing commonality within that frame while preserving important structural properties of the frame. The building blocks of this coarse representation are rectangular regions called cuboids, which are computationally simple and has a compact description. Then we propose to employ the coarse frame as an additional source for predictive coding of the current frame. Experimental results show an improvement in bit rate savings over a reference codec for HEVC, with minor increase in the codec computational complexity. © 2020 IEEE.
- Authors: Ahmmed, Ashe , Paul, Manoranjan , Murshed, Manzur , Taubman, David
- Date: 2020
- Type: Text , Conference paper
- Relation: 22nd IEEE International Workshop on Multimedia Signal Processing, MMSP 2020, Virtual Tampere, Finland 21-24 September 2020
- Full Text:
- Reviewed:
- Description: Video coding algorithms attempt to minimize the significant commonality that exists within a video sequence. Each new video coding standard contains tools that can perform this task more efficiently compared to its predecessors. In this work, we form a coarse representation of the current frame by minimizing commonality within that frame while preserving important structural properties of the frame. The building blocks of this coarse representation are rectangular regions called cuboids, which are computationally simple and has a compact description. Then we propose to employ the coarse frame as an additional source for predictive coding of the current frame. Experimental results show an improvement in bit rate savings over a reference codec for HEVC, with minor increase in the codec computational complexity. © 2020 IEEE.
Depth sequence coding with hierarchical partitioning and spatial-domain quantization
- Shahriyar, Shampa, Murshed, Manzur, Ali, Mortuza, Paul, Manoranjan
- Authors: Shahriyar, Shampa , Murshed, Manzur , Ali, Mortuza , Paul, Manoranjan
- Date: 2020
- Type: Text , Journal article
- Relation: IEEE Transactions on Circuits and Systems for Video Technology Vol. 30, no. 3 (2020), p. 835-849
- Full Text:
- Reviewed:
- Description: Depth coding in 3D-HEVC deforms object shapes due to block-level edge-approximation and lacks efficient techniques to exploit the statistical redundancy, due to the frame-level clustering tendency in depth data, for higher coding gain at near-lossless quality. This paper presents a standalone mono-view depth sequence coder, which preserves edges implicitly by limiting quantization to the spatial-domain and exploits the frame-level clustering tendency efficiently with a novel binary tree-based decomposition (BTBD) technique. The BTBD can exploit the statistical redundancy in frame-level syntax, motion components, and residuals efficiently with fewer block-level prediction/coding modes and simpler context modeling for context-adaptive arithmetic coding. Compared with the depth coder in 3D-HEVC, the proposed one has achieved significantly lower bitrate at lossless to near-lossless quality range for mono-view coding and rendered superior quality synthetic views from the depth maps, compressed at the same bitrate, and the corresponding texture frames. © 1991-2012 IEEE.
- Authors: Shahriyar, Shampa , Murshed, Manzur , Ali, Mortuza , Paul, Manoranjan
- Date: 2020
- Type: Text , Journal article
- Relation: IEEE Transactions on Circuits and Systems for Video Technology Vol. 30, no. 3 (2020), p. 835-849
- Full Text:
- Reviewed:
- Description: Depth coding in 3D-HEVC deforms object shapes due to block-level edge-approximation and lacks efficient techniques to exploit the statistical redundancy, due to the frame-level clustering tendency in depth data, for higher coding gain at near-lossless quality. This paper presents a standalone mono-view depth sequence coder, which preserves edges implicitly by limiting quantization to the spatial-domain and exploits the frame-level clustering tendency efficiently with a novel binary tree-based decomposition (BTBD) technique. The BTBD can exploit the statistical redundancy in frame-level syntax, motion components, and residuals efficiently with fewer block-level prediction/coding modes and simpler context modeling for context-adaptive arithmetic coding. Compared with the depth coder in 3D-HEVC, the proposed one has achieved significantly lower bitrate at lossless to near-lossless quality range for mono-view coding and rendered superior quality synthetic views from the depth maps, compressed at the same bitrate, and the corresponding texture frames. © 1991-2012 IEEE.
A novel no-reference subjective quality metric for free viewpoint video using human eye movement
- Podder, Pallab, Paul, Manoranjan, Murshed, Manzur
- Authors: Podder, Pallab , Paul, Manoranjan , Murshed, Manzur
- Date: 2018
- Type: Text , Conference proceedings
- Relation: 8th Pacific-Rim Symposium on Image and Video Technology, PSIVT 2017; Wuhan, China; 20th-24th November 2017; published in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) Vol. 10749 LNCS, p. 237-251
- Full Text:
- Reviewed:
- Description: The free viewpoint video (FVV) allows users to interactively control the viewpoint and generate new views of a dynamic scene from any 3D position for better 3D visual experience with depth perception. Multiview video coding exploits both texture and depth video information from various angles to encode a number of views to facilitate FVV. The usual practice for the single view or multiview quality assessment is characterized by evolving the objective quality assessment metrics due to their simplicity and real time applications such as the peak signal-to-noise ratio (PSNR) or the structural similarity index (SSIM). However, the PSNR or SSIM requires reference image for quality evaluation and could not be successfully employed in FVV as the new view in FVV does not have any reference view to compare with. Conversely, the widely used subjective estimator- mean opinion score (MOS) is often biased by the testing environment, viewers mode, domain knowledge, and many other factors that may actively influence on actual assessment. To address this limitation, in this work, we devise a no-reference subjective quality assessment metric by simply exploiting the pattern of human eye browsing on FVV. Over different quality contents of FVV, the participants eye-tracker recorded spatio-temporal gaze-data indicate more concentrated eye-traversing approach for relatively better quality. Thus, we calculate the Length, Angle, Pupil-size, and Gaze-duration features from the recorded gaze trajectory. The content and resolution invariant operation is carried out prior to synthesizing them using an adaptive weighted function to develop a new quality metric using eye traversal (QMET). Tested results reveal that the proposed QMET performs better than the SSIM and MOS in terms of assessing different aspects of coded video quality for a wide range of FVV contents.
- Description: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
- Authors: Podder, Pallab , Paul, Manoranjan , Murshed, Manzur
- Date: 2018
- Type: Text , Conference proceedings
- Relation: 8th Pacific-Rim Symposium on Image and Video Technology, PSIVT 2017; Wuhan, China; 20th-24th November 2017; published in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) Vol. 10749 LNCS, p. 237-251
- Full Text:
- Reviewed:
- Description: The free viewpoint video (FVV) allows users to interactively control the viewpoint and generate new views of a dynamic scene from any 3D position for better 3D visual experience with depth perception. Multiview video coding exploits both texture and depth video information from various angles to encode a number of views to facilitate FVV. The usual practice for the single view or multiview quality assessment is characterized by evolving the objective quality assessment metrics due to their simplicity and real time applications such as the peak signal-to-noise ratio (PSNR) or the structural similarity index (SSIM). However, the PSNR or SSIM requires reference image for quality evaluation and could not be successfully employed in FVV as the new view in FVV does not have any reference view to compare with. Conversely, the widely used subjective estimator- mean opinion score (MOS) is often biased by the testing environment, viewers mode, domain knowledge, and many other factors that may actively influence on actual assessment. To address this limitation, in this work, we devise a no-reference subjective quality assessment metric by simply exploiting the pattern of human eye browsing on FVV. Over different quality contents of FVV, the participants eye-tracker recorded spatio-temporal gaze-data indicate more concentrated eye-traversing approach for relatively better quality. Thus, we calculate the Length, Angle, Pupil-size, and Gaze-duration features from the recorded gaze trajectory. The content and resolution invariant operation is carried out prior to synthesizing them using an adaptive weighted function to develop a new quality metric using eye traversal (QMET). Tested results reveal that the proposed QMET performs better than the SSIM and MOS in terms of assessing different aspects of coded video quality for a wide range of FVV contents.
- Description: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Efficient video coding using visual sensitive information for HEVC coding standard
- Podder, Pallab, Paul, Manoranjan, Murshed, Manzur
- Authors: Podder, Pallab , Paul, Manoranjan , Murshed, Manzur
- Date: 2018
- Type: Text , Journal article
- Relation: IEEE Access Vol. 6, no. (2018), p. 75695-75708
- Full Text:
- Reviewed:
- Description: The latest high efficiency video coding (HEVC) standard introduces a large number of inter-mode block partitioning modes. The HEVC reference test model (HM) uses partially exhaustive tree-structured mode selection, which still explores a large number of prediction unit (PU) modes for a coding unit (CU). This impacts on encoding time rise which deprives a number of electronic devices having limited processing resources to use various features of HEVC. By analyzing the homogeneity, residual, and different statistical correlation among modes, many researchers speed-up the encoding process through the number of PU mode reduction. However, these approaches could not demonstrate the similar rate-distortion (RD) performance with the HM due to their dependency on existing Lagrangian cost function (LCF) within the HEVC framework. In this paper, to avoid the complete dependency on LCF in the initial phase, we exploit visual sensitive foreground motion and spatial salient metric (FMSSM) in a block. To capture its motion and saliency features, we use the dynamic background and visual saliency modeling, respectively. According to the FMSSM values, a subset of PU modes is then explored for encoding the CU. This preprocessing phase is independent from the existing LCF. As the proposed coding technique further reduces the number of PU modes using two simple criteria (i.e., motion and saliency), it outperforms the HM in terms of encoding time reduction. As it also encodes the uncovered and static background areas using the dynamic background frame as a substituted reference frame, it does not sacrifice quality. Tested results reveal that the proposed method achieves 32% average encoding time reduction of the HM without any quality loss for a wide range of videos.
- Authors: Podder, Pallab , Paul, Manoranjan , Murshed, Manzur
- Date: 2018
- Type: Text , Journal article
- Relation: IEEE Access Vol. 6, no. (2018), p. 75695-75708
- Full Text:
- Reviewed:
- Description: The latest high efficiency video coding (HEVC) standard introduces a large number of inter-mode block partitioning modes. The HEVC reference test model (HM) uses partially exhaustive tree-structured mode selection, which still explores a large number of prediction unit (PU) modes for a coding unit (CU). This impacts on encoding time rise which deprives a number of electronic devices having limited processing resources to use various features of HEVC. By analyzing the homogeneity, residual, and different statistical correlation among modes, many researchers speed-up the encoding process through the number of PU mode reduction. However, these approaches could not demonstrate the similar rate-distortion (RD) performance with the HM due to their dependency on existing Lagrangian cost function (LCF) within the HEVC framework. In this paper, to avoid the complete dependency on LCF in the initial phase, we exploit visual sensitive foreground motion and spatial salient metric (FMSSM) in a block. To capture its motion and saliency features, we use the dynamic background and visual saliency modeling, respectively. According to the FMSSM values, a subset of PU modes is then explored for encoding the CU. This preprocessing phase is independent from the existing LCF. As the proposed coding technique further reduces the number of PU modes using two simple criteria (i.e., motion and saliency), it outperforms the HM in terms of encoding time reduction. As it also encodes the uncovered and static background areas using the dynamic background frame as a substituted reference frame, it does not sacrifice quality. Tested results reveal that the proposed method achieves 32% average encoding time reduction of the HM without any quality loss for a wide range of videos.
A novel quality metric using spatiotemporal correlational data of human eye maneuver
- Podder, Pallab, Paul, Manoranjan, Murshed, Manzur
- Authors: Podder, Pallab , Paul, Manoranjan , Murshed, Manzur
- Date: 2017
- Type: Text , Conference proceedings
- Relation: 2017 International Conference on Digital Image Computing : Techniques and Applications, DICTA 2017; Sydney, Australia; 29th November-1st December 2017 Vol. 2017-December, p. 1-8
- Full Text:
- Reviewed:
- Description: The popularly used subjective estimator- mean opinion score (MOS) is often biased by the testing environment, viewers mode, domain expertise, and many other factors that may actively influence on actual assessment. We therefore, devise a no- reference subjective quality assessment metric by exploiting the nature of human eye browsing on videos. The participants' eye-tracker recorded gaze-data indicate more concentrated eye- traversing approach for relatively better quality. We calculate the Length, Angle, Pupil-size, and Gaze-duration features from the recorded gaze trajectory. The content and resolution invariant operation is carried out prior to synthesizing them using an adaptive weighted function to develop a new quality metric using eye traversal (QMET). Tested results reveal that the quality evaluation carried out by QMET demonstrates a strong correlation with the most widely used peak signal-to-noise ratio (PSNR), structural similarity index (SSIM), and the MOS.
- Description: DICTA 2017 - 2017 International Conference on Digital Image Computing: Techniques and Applications
- Authors: Podder, Pallab , Paul, Manoranjan , Murshed, Manzur
- Date: 2017
- Type: Text , Conference proceedings
- Relation: 2017 International Conference on Digital Image Computing : Techniques and Applications, DICTA 2017; Sydney, Australia; 29th November-1st December 2017 Vol. 2017-December, p. 1-8
- Full Text:
- Reviewed:
- Description: The popularly used subjective estimator- mean opinion score (MOS) is often biased by the testing environment, viewers mode, domain expertise, and many other factors that may actively influence on actual assessment. We therefore, devise a no- reference subjective quality assessment metric by exploiting the nature of human eye browsing on videos. The participants' eye-tracker recorded gaze-data indicate more concentrated eye- traversing approach for relatively better quality. We calculate the Length, Angle, Pupil-size, and Gaze-duration features from the recorded gaze trajectory. The content and resolution invariant operation is carried out prior to synthesizing them using an adaptive weighted function to develop a new quality metric using eye traversal (QMET). Tested results reveal that the quality evaluation carried out by QMET demonstrates a strong correlation with the most widely used peak signal-to-noise ratio (PSNR), structural similarity index (SSIM), and the MOS.
- Description: DICTA 2017 - 2017 International Conference on Digital Image Computing: Techniques and Applications
Adaptive weighted non-parametric background model for efficient video coding
- Chakraborty, Subrata, Paul, Manoranjan, Murshed, Manzur, Ali, Mortuza
- Authors: Chakraborty, Subrata , Paul, Manoranjan , Murshed, Manzur , Ali, Mortuza
- Date: 2017
- Type: Text , Journal article
- Relation: Neurocomputing Vol. 226, no. (2017), p. 35-45
- Full Text:
- Reviewed:
- Description: Dynamic background frame based video coding using mixture of Gaussian (MoG) based background modelling has achieved better rate distortion performance compared to the H.264 standard. However, they suffer from high computation time, low coding efficiency for dynamic videos, and prior knowledge requirement of video content. In this paper, we introduce the application of the non-parametric (NP) background modelling approach for video coding domain. We present a novel background modelling technique, called weighted non-parametric (WNP) which balances the historical trend and the recent value of the pixel intensities adaptively based on the content and characteristics of any particular video. WNP is successfully embedded into the latest HEVC video coding standard for better rate-distortion performance. Moreover, a novel scene adaptive non-parametric (SANP) technique is also developed to handle video sequences with high dynamic background. Being non-parametric, the proposed techniques naturally exhibit superior performance in dynamic background modelling without a priori knowledge of video data distribution.
- Authors: Chakraborty, Subrata , Paul, Manoranjan , Murshed, Manzur , Ali, Mortuza
- Date: 2017
- Type: Text , Journal article
- Relation: Neurocomputing Vol. 226, no. (2017), p. 35-45
- Full Text:
- Reviewed:
- Description: Dynamic background frame based video coding using mixture of Gaussian (MoG) based background modelling has achieved better rate distortion performance compared to the H.264 standard. However, they suffer from high computation time, low coding efficiency for dynamic videos, and prior knowledge requirement of video content. In this paper, we introduce the application of the non-parametric (NP) background modelling approach for video coding domain. We present a novel background modelling technique, called weighted non-parametric (WNP) which balances the historical trend and the recent value of the pixel intensities adaptively based on the content and characteristics of any particular video. WNP is successfully embedded into the latest HEVC video coding standard for better rate-distortion performance. Moreover, a novel scene adaptive non-parametric (SANP) technique is also developed to handle video sequences with high dynamic background. Being non-parametric, the proposed techniques naturally exhibit superior performance in dynamic background modelling without a priori knowledge of video data distribution.
Improved depth coding for HEVC focusing on depth edge approximation
- Podder, Pallab, Paul, Manoranjan, Rahaman, Motiur, Murshed, Manzur
- Authors: Podder, Pallab , Paul, Manoranjan , Rahaman, Motiur , Murshed, Manzur
- Date: 2017
- Type: Text , Journal article , acceptedVersion
- Relation: Signal Processing: Image Communication Vol. 55, no. (2017), p. 80-92
- Full Text:
- Reviewed:
- Description: The latest High Efficiency Video Coding (HEVC) standard has greatly improved the coding efficiency compared to its predecessor H.264. An important share of which is the adoption of hierarchical block partitioning structures and an extended number of modes. The structure of existing inter-modes is appropriate mainly to handle the rectangular and square aligned motion patterns. However, they could not be suitable for the block partitioning of depth objects having partial foreground motion with irregular edges and background. In such cases, the HEVC reference test model (HM) normally explores finer level block partitioning that requires more bits and encoding time to compensate large residuals. Since motion detection is the underlying criteria for mode selection, in this work, we use the energy concentration ratio feature of phase correlation to capture different types of motion in depth object. For better motion modeling focusing at depth edges, the proposed technique also uses an extra pattern mode comprising a group of templates with various rectangular and non-rectangular object shapes and edges. As the pattern mode could save bits by encoding only the foreground areas and beat all other inter-modes in a block once selected, the proposed technique could improve the rate-distortion performance. It could also reduce encoding time by skipping further branching using the pattern mode and selecting a subset of modes using innovative pre-processing criteria. Experimentally it could save 29% average encoding time and improve 0.10 dB Bjontegaard Delta peak signal-to-noise ratio compared to the HM.
- Authors: Podder, Pallab , Paul, Manoranjan , Rahaman, Motiur , Murshed, Manzur
- Date: 2017
- Type: Text , Journal article , acceptedVersion
- Relation: Signal Processing: Image Communication Vol. 55, no. (2017), p. 80-92
- Full Text:
- Reviewed:
- Description: The latest High Efficiency Video Coding (HEVC) standard has greatly improved the coding efficiency compared to its predecessor H.264. An important share of which is the adoption of hierarchical block partitioning structures and an extended number of modes. The structure of existing inter-modes is appropriate mainly to handle the rectangular and square aligned motion patterns. However, they could not be suitable for the block partitioning of depth objects having partial foreground motion with irregular edges and background. In such cases, the HEVC reference test model (HM) normally explores finer level block partitioning that requires more bits and encoding time to compensate large residuals. Since motion detection is the underlying criteria for mode selection, in this work, we use the energy concentration ratio feature of phase correlation to capture different types of motion in depth object. For better motion modeling focusing at depth edges, the proposed technique also uses an extra pattern mode comprising a group of templates with various rectangular and non-rectangular object shapes and edges. As the pattern mode could save bits by encoding only the foreground areas and beat all other inter-modes in a block once selected, the proposed technique could improve the rate-distortion performance. It could also reduce encoding time by skipping further branching using the pattern mode and selecting a subset of modes using innovative pre-processing criteria. Experimentally it could save 29% average encoding time and improve 0.10 dB Bjontegaard Delta peak signal-to-noise ratio compared to the HM.
Lossless hyperspectral image compression using binary tree based decomposition
- Shahriyar, Shampa, Paul, Manoranjan, Murshed, Manzur, Ali, Mortuza
- Authors: Shahriyar, Shampa , Paul, Manoranjan , Murshed, Manzur , Ali, Mortuza
- Date: 2016
- Type: Text , Conference proceedings
- Relation: 2016 International Conference on Digital Image Computing: Techniques and Applications (Dicta); Gold Coast, Australia; 30th November-2nd December 2016 p. 428-435
- Full Text:
- Reviewed:
- Description: A Hyperspectral (HS) image provides observational powers beyond human vision capability but represents more than 100 times data compared to a traditional image. To transmit and store the huge volume of an HS image, we argue that a fundamental shift is required from the existing "original pixel intensity"based coding approaches using traditional image coders (e.g. JPEG) to the "residual" based approaches using a predictive coder exploiting band-wise correlation for better compression performance. Moreover, as HS images are used in detection or classification they need to be in original form; lossy schemes can trim off uninteresting data along with compression, which can be important to specific analysis purposes. A modified lossless HS coder is required to exploit spatial- spectral redundancy using predictive residual coding. Every spectral band of an HS image can be treated like they are the individual frame of a video to impose inter band prediction. In this paper, we propose a binary tree based lossless predictive HS coding scheme that arranges the residual frame into integer residual bitmap. High spatial correlation in HS residual frame is exploited by creating large homogeneous blocks of adaptive size, which are then coded as a unit using context based arithmetic coding. On the standard HS data set, the proposed lossless predictive coding has achieved compression ratio in the range of 1.92 to 7.94. In this paper, we compare the proposed method with mainstream lossless coders (JPEG-LS and lossless HEVC). For JPEG-LS, HEVCIntra and HEVCMain, proposed technique has reduced bit-rate by 35%, 40% and 6.79% respectively by exploiting spatial correlation in predicted HS residuals.
- Authors: Shahriyar, Shampa , Paul, Manoranjan , Murshed, Manzur , Ali, Mortuza
- Date: 2016
- Type: Text , Conference proceedings
- Relation: 2016 International Conference on Digital Image Computing: Techniques and Applications (Dicta); Gold Coast, Australia; 30th November-2nd December 2016 p. 428-435
- Full Text:
- Reviewed:
- Description: A Hyperspectral (HS) image provides observational powers beyond human vision capability but represents more than 100 times data compared to a traditional image. To transmit and store the huge volume of an HS image, we argue that a fundamental shift is required from the existing "original pixel intensity"based coding approaches using traditional image coders (e.g. JPEG) to the "residual" based approaches using a predictive coder exploiting band-wise correlation for better compression performance. Moreover, as HS images are used in detection or classification they need to be in original form; lossy schemes can trim off uninteresting data along with compression, which can be important to specific analysis purposes. A modified lossless HS coder is required to exploit spatial- spectral redundancy using predictive residual coding. Every spectral band of an HS image can be treated like they are the individual frame of a video to impose inter band prediction. In this paper, we propose a binary tree based lossless predictive HS coding scheme that arranges the residual frame into integer residual bitmap. High spatial correlation in HS residual frame is exploited by creating large homogeneous blocks of adaptive size, which are then coded as a unit using context based arithmetic coding. On the standard HS data set, the proposed lossless predictive coding has achieved compression ratio in the range of 1.92 to 7.94. In this paper, we compare the proposed method with mainstream lossless coders (JPEG-LS and lossless HEVC). For JPEG-LS, HEVCIntra and HEVCMain, proposed technique has reduced bit-rate by 35%, 40% and 6.79% respectively by exploiting spatial correlation in predicted HS residuals.
Lossless image coding using hierarchical decomposition and recursive partitioning
- Ali, Mortuza, Murshed, Manzur, Shahriyar, Shampa, Paul, Manoranjan
- Authors: Ali, Mortuza , Murshed, Manzur , Shahriyar, Shampa , Paul, Manoranjan
- Date: 2016
- Type: Text , Journal article
- Relation: APSIPA Transactions on Signal and Information Processing Vol. 5, no. (2016), p. 1-11
- Relation: http://purl.org/au-research/grants/arc/DP130103670
- Full Text:
- Reviewed:
- Description: State-Of-The-Art lossless image compression schemes, such as JPEG-LS and CALIC, have been proposed in the context-adaptive predictive coding framework. These schemes involve a prediction step followed by context-adaptive entropy coding of the residuals. However, the models for context determination proposed in the literature, have been designed using ad-hoc techniques. In this paper, we take an alternative approach where we fix a simpler context model and then rely on a systematic technique to efficiently exploit spatial correlation to achieve efficient compression. The essential idea is to decompose the image into binary bitmaps such that the spatial correlation that exists among non-binary symbols is captured as the correlation among few bit positions. The proposed scheme then encodes the bitmaps in a particular order based on the simple context model. However, instead of encoding a bitmap as a whole, we partition it into rectangular blocks, induced by a binary tree, and then independently encode the blocks. The motivation for partitioning is to explicitly identify the blocks within which the statistical correlation remains the same. On a set of standard test images, the proposed scheme, using the same predictor as JPEG-LS, achieved an overall bit-rate saving of 1.56% against JPEG-LS. © 2016 The Authors.
- Authors: Ali, Mortuza , Murshed, Manzur , Shahriyar, Shampa , Paul, Manoranjan
- Date: 2016
- Type: Text , Journal article
- Relation: APSIPA Transactions on Signal and Information Processing Vol. 5, no. (2016), p. 1-11
- Relation: http://purl.org/au-research/grants/arc/DP130103670
- Full Text:
- Reviewed:
- Description: State-Of-The-Art lossless image compression schemes, such as JPEG-LS and CALIC, have been proposed in the context-adaptive predictive coding framework. These schemes involve a prediction step followed by context-adaptive entropy coding of the residuals. However, the models for context determination proposed in the literature, have been designed using ad-hoc techniques. In this paper, we take an alternative approach where we fix a simpler context model and then rely on a systematic technique to efficiently exploit spatial correlation to achieve efficient compression. The essential idea is to decompose the image into binary bitmaps such that the spatial correlation that exists among non-binary symbols is captured as the correlation among few bit positions. The proposed scheme then encodes the bitmaps in a particular order based on the simple context model. However, instead of encoding a bitmap as a whole, we partition it into rectangular blocks, induced by a binary tree, and then independently encode the blocks. The motivation for partitioning is to explicitly identify the blocks within which the statistical correlation remains the same. On a set of standard test images, the proposed scheme, using the same predictor as JPEG-LS, achieved an overall bit-rate saving of 1.56% against JPEG-LS. © 2016 The Authors.
QMET : A new quality assessment metric for no-reference video coding by using human eye traversal
- Podder, Pallab, Paul, Manoranjan, Murshed, Manzur
- Authors: Podder, Pallab , Paul, Manoranjan , Murshed, Manzur
- Date: 2016
- Type: Text , Conference proceedings
- Relation: 2016 International Conference on Image and Vision Computing New Zealand, IVCNZ 2016; Palmerston North, New Zealand; 21st-22nd November 2016 p. 1-6
- Full Text:
- Reviewed:
- Description: The subjective quality assessment (SQA) is an ever demanding approach due to its in-depth interactivity to the human cognition. The addition of no-reference based scheme could equip the SQA techniques to tackle further challenges. Existing widely used objective metrics-peak signal-to-noise ratio (PSNR), structural similarity index (SSIM) or the subjective estimator-mean opinion score (MOS) requires original image for quality evaluation that limits their uses for the situation having no-reference. In this work, we present a no-reference based SQA technique that could be an impressive substitute to the reference-based approaches for quality evaluation. The High Efficiency Video Coding (HEVC) reference test model (HM15.0) is first exploited to generate five different qualities of the HEVC recommended eight class sequences. To assess different aspects of coded video quality, a group of ten participants are employed and their eye-tracker (ET) recorded data demonstrate closer correlation among gaze plots for relatively better quality video contents. Therefore, we innovatively calculate the amount of approximation of smooth eye traversal (ASET) by using distance, angle, and pupil-size feature from recorded gaze trajectory data and develop a new-quality metric based on eye traversal (QMET). Experimental results show that the quality evaluation carried out by QMET is highly correlated to the HM recommended coding quality. The performance of the QMET is also compared with the PSNR and SSIM metrics to justify the effectiveness of each other.
- Description: International Conference Image and Vision Computing New Zealand
- Authors: Podder, Pallab , Paul, Manoranjan , Murshed, Manzur
- Date: 2016
- Type: Text , Conference proceedings
- Relation: 2016 International Conference on Image and Vision Computing New Zealand, IVCNZ 2016; Palmerston North, New Zealand; 21st-22nd November 2016 p. 1-6
- Full Text:
- Reviewed:
- Description: The subjective quality assessment (SQA) is an ever demanding approach due to its in-depth interactivity to the human cognition. The addition of no-reference based scheme could equip the SQA techniques to tackle further challenges. Existing widely used objective metrics-peak signal-to-noise ratio (PSNR), structural similarity index (SSIM) or the subjective estimator-mean opinion score (MOS) requires original image for quality evaluation that limits their uses for the situation having no-reference. In this work, we present a no-reference based SQA technique that could be an impressive substitute to the reference-based approaches for quality evaluation. The High Efficiency Video Coding (HEVC) reference test model (HM15.0) is first exploited to generate five different qualities of the HEVC recommended eight class sequences. To assess different aspects of coded video quality, a group of ten participants are employed and their eye-tracker (ET) recorded data demonstrate closer correlation among gaze plots for relatively better quality video contents. Therefore, we innovatively calculate the amount of approximation of smooth eye traversal (ASET) by using distance, angle, and pupil-size feature from recorded gaze trajectory data and develop a new-quality metric based on eye traversal (QMET). Experimental results show that the quality evaluation carried out by QMET is highly correlated to the HM recommended coding quality. The performance of the QMET is also compared with the PSNR and SSIM metrics to justify the effectiveness of each other.
- Description: International Conference Image and Vision Computing New Zealand
Fast coding strategy for HEVC by motion features and saliency applied on difference between successive image blocks
- Podder, Pallab, Paul, Manoranjan, Murshed, Manzur
- Authors: Podder, Pallab , Paul, Manoranjan , Murshed, Manzur
- Date: 2015
- Type: Text , Conference proceedings
- Relation: ConferencePacific-Rim Symposium on Image and Video Technology, Auckland, 23-27th Nov, 2016, In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics).9431 p. 175-186
- Relation: http://purl.org/au-research/grants/arc/DP130103670
- Full Text:
- Reviewed:
- Description: Introducing a number of innovative and powerful coding tools, the High Efficiency Video Coding (HEVC) standard promises double compression efficiency, compared to its predecessor H.264, with similar perceptual quality. The increased computational time complexity is an important issue for the video coding research community as well. An attempt to reduce this complexity of HEVC is adopted in this paper, by efficient selection of appropriate block-partitioning modes based on motion features and the saliency applied to the difference between successive image blocks. As this difference gives us the explicit visible motion and salient information, we develop a cost function by combining the motion features and image difference salient feature. The combined features are then converted into area of interest (AOI) based binary pattern for the current block. This pattern is then compared with a previously defined codebook of binary pattern templates for a subset of mode selection. Motion estimation (ME) and motion compensation (MC) are performed only on the selected subset of modes, without exhaustive exploration of all modes available in HEVC. The experimental results reveal a reduction of 42% encoding time complexity of HEVC encoder with similar subjective and objective image quality.
- Authors: Podder, Pallab , Paul, Manoranjan , Murshed, Manzur
- Date: 2015
- Type: Text , Conference proceedings
- Relation: ConferencePacific-Rim Symposium on Image and Video Technology, Auckland, 23-27th Nov, 2016, In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics).9431 p. 175-186
- Relation: http://purl.org/au-research/grants/arc/DP130103670
- Full Text:
- Reviewed:
- Description: Introducing a number of innovative and powerful coding tools, the High Efficiency Video Coding (HEVC) standard promises double compression efficiency, compared to its predecessor H.264, with similar perceptual quality. The increased computational time complexity is an important issue for the video coding research community as well. An attempt to reduce this complexity of HEVC is adopted in this paper, by efficient selection of appropriate block-partitioning modes based on motion features and the saliency applied to the difference between successive image blocks. As this difference gives us the explicit visible motion and salient information, we develop a cost function by combining the motion features and image difference salient feature. The combined features are then converted into area of interest (AOI) based binary pattern for the current block. This pattern is then compared with a previously defined codebook of binary pattern templates for a subset of mode selection. Motion estimation (ME) and motion compensation (MC) are performed only on the selected subset of modes, without exhaustive exploration of all modes available in HEVC. The experimental results reveal a reduction of 42% encoding time complexity of HEVC encoder with similar subjective and objective image quality.
Fast intermode selection for HEVC video coding using phase correlation
- Podder, Pallab, Paul, Manoranjan, Murshed, Manzur, Chakraborty, Subrata
- Authors: Podder, Pallab , Paul, Manoranjan , Murshed, Manzur , Chakraborty, Subrata
- Date: 2015
- Type: Text , Conference proceedings , Conference paper
- Relation: 2014 International Conference on Digital Image Computing: Techniques and Applications, DICTA 2014; Wollongong, Australia; 25th-27th November 2014 p. 1-8
- Relation: http://purl.org/au-research/grants/arc/DP130103670
- Full Text:
- Reviewed:
- Description: The recent High Efficiency Video Coding (HEVC) Standard demonstrates higher rate-distortion (RD) performance compared to its predecessor H.264/AVC using different new tools especially larger and asymmetric inter-mode variable size motion estimation and compensation. This requires more than 4 times computational time compared to H.264/AVC. As a result it has always been a big concern for the researchers to reduce the amount of time while maintaining the standard quality of the video. The reduction of computational time by smart selection of the appropriate modes in HEVC is our motivation. To accomplish this task in this paper, we use phase correlation to approximate the motion information between current and reference blocks by comparing with a number of different binary pattern templates and then select a subset of motion estimation modes without exhaustively exploring all possible modes. The experimental results exhibit that the proposed HEVC-PC (HEVC with Phase Correlation) scheme outperforms the standard HEVC scheme in terms of computational time while preserving-the same quality of the video sequences. More specifically, around 40% encoding time is reduced compared to the exhaustive mode selection in HEVC. © 2014 IEEE.
- Description: 2014 International Conference on Digital Image Computing: Techniques and Applications, DICTA 2014
- Authors: Podder, Pallab , Paul, Manoranjan , Murshed, Manzur , Chakraborty, Subrata
- Date: 2015
- Type: Text , Conference proceedings , Conference paper
- Relation: 2014 International Conference on Digital Image Computing: Techniques and Applications, DICTA 2014; Wollongong, Australia; 25th-27th November 2014 p. 1-8
- Relation: http://purl.org/au-research/grants/arc/DP130103670
- Full Text:
- Reviewed:
- Description: The recent High Efficiency Video Coding (HEVC) Standard demonstrates higher rate-distortion (RD) performance compared to its predecessor H.264/AVC using different new tools especially larger and asymmetric inter-mode variable size motion estimation and compensation. This requires more than 4 times computational time compared to H.264/AVC. As a result it has always been a big concern for the researchers to reduce the amount of time while maintaining the standard quality of the video. The reduction of computational time by smart selection of the appropriate modes in HEVC is our motivation. To accomplish this task in this paper, we use phase correlation to approximate the motion information between current and reference blocks by comparing with a number of different binary pattern templates and then select a subset of motion estimation modes without exhaustively exploring all possible modes. The experimental results exhibit that the proposed HEVC-PC (HEVC with Phase Correlation) scheme outperforms the standard HEVC scheme in terms of computational time while preserving-the same quality of the video sequences. More specifically, around 40% encoding time is reduced compared to the exhaustive mode selection in HEVC. © 2014 IEEE.
- Description: 2014 International Conference on Digital Image Computing: Techniques and Applications, DICTA 2014
Joint texture and depth coding using cuboid data compression
- Paul, Manoranjan, Chakraborty, Subrata, Murshed, Manzur, Podder, Pallab
- Authors: Paul, Manoranjan , Chakraborty, Subrata , Murshed, Manzur , Podder, Pallab
- Date: 2015
- Type: Text , Conference proceedings
- Relation: 2015 18th International Conference on Computer and Information Technology (ICCIT); Dhaka, Bangladesh; 21st-23rd December 2015 p. 138-143
- Full Text:
- Reviewed:
- Description: The latest multiview video coding (MVC) standards such as 3D-HEVC and H.264/MVC normally encodes texture and depth videos separately. Significant amount of rate-distortion performance and computational performance are sacrificed due to separate encoding due to the lack of exploitation of joint information. Obviously, separate encoding also creates synchronization issue for 3D scene formation in the decoder. Moreover, the hierarchical frame referencing architecture in the MVC creates random access frame delay. In this paper we develop an encoder and decoder framework where we can encode texture and depth video jointly by forming and encoding 3D cuboid using high dimensional entropy coding. The results from our experiments show that our proposed framework outperforms the 3D-HEVC in rate-distortion performance and reduces the computational time significantly by reducing random access frame delay.
- Authors: Paul, Manoranjan , Chakraborty, Subrata , Murshed, Manzur , Podder, Pallab
- Date: 2015
- Type: Text , Conference proceedings
- Relation: 2015 18th International Conference on Computer and Information Technology (ICCIT); Dhaka, Bangladesh; 21st-23rd December 2015 p. 138-143
- Full Text:
- Reviewed:
- Description: The latest multiview video coding (MVC) standards such as 3D-HEVC and H.264/MVC normally encodes texture and depth videos separately. Significant amount of rate-distortion performance and computational performance are sacrificed due to separate encoding due to the lack of exploitation of joint information. Obviously, separate encoding also creates synchronization issue for 3D scene formation in the decoder. Moreover, the hierarchical frame referencing architecture in the MVC creates random access frame delay. In this paper we develop an encoder and decoder framework where we can encode texture and depth video jointly by forming and encoding 3D cuboid using high dimensional entropy coding. The results from our experiments show that our proposed framework outperforms the 3D-HEVC in rate-distortion performance and reduces the computational time significantly by reducing random access frame delay.
A novel video coding scheme using a scene adaptive non-parametric background model
- Chakraborty, Subrata, Paul, Manoranjan, Murshed, Manzur, Ali, Mortuza
- Authors: Chakraborty, Subrata , Paul, Manoranjan , Murshed, Manzur , Ali, Mortuza
- Date: 2014
- Type: Text , Conference paper
- Relation: 16th IEEE International Workshop on Multimedia Signal Processing, MMSP 2014 p. 1-6
- Relation: http://purl.org/au-research/grants/arc/DP130103670
- Full Text:
- Reviewed:
- Description: Video coding techniques utilising background frames, provide better rate distortion performance by exploiting coding efficiency in uncovered background areas compared to the latest video coding standard. Parametric approaches such as the mixture of Gaussian (MoG) based background modeling has been widely used however they require prior knowledge about the test videos for parameter estimation. Recently introduced non-parametric (NP) based background modeling techniques successfully improved video coding performance through a HEVC integrated coding scheme. The inherent nature of the NP technique naturally exhibits superior performance in dynamic background scenarios compared to the MoG based technique without a priori knowledge of video data distribution. Although NP based coding schemes showed promising coding performances, they suffer from a number of key challenges - (a) determination of the optimal subset of training frames for generating a suitable background that can be used as a reference frame during coding, (b) incorporating dynamic changes in the background effectively after the initial background frame is generated, (c) managing frequent scene change leading to performance degradation, and (d) optimizing coding quality ratio between an I-frame and other frames under bit rate constraints. In this study we develop a new scene adaptive coding scheme using the NP based technique, capable of solving the current challenges by incorporating a new continuously updating background generation process. Extensive experimental results are also provided to validate the effectiveness of the new scheme.
- Authors: Chakraborty, Subrata , Paul, Manoranjan , Murshed, Manzur , Ali, Mortuza
- Date: 2014
- Type: Text , Conference paper
- Relation: 16th IEEE International Workshop on Multimedia Signal Processing, MMSP 2014 p. 1-6
- Relation: http://purl.org/au-research/grants/arc/DP130103670
- Full Text:
- Reviewed:
- Description: Video coding techniques utilising background frames, provide better rate distortion performance by exploiting coding efficiency in uncovered background areas compared to the latest video coding standard. Parametric approaches such as the mixture of Gaussian (MoG) based background modeling has been widely used however they require prior knowledge about the test videos for parameter estimation. Recently introduced non-parametric (NP) based background modeling techniques successfully improved video coding performance through a HEVC integrated coding scheme. The inherent nature of the NP technique naturally exhibits superior performance in dynamic background scenarios compared to the MoG based technique without a priori knowledge of video data distribution. Although NP based coding schemes showed promising coding performances, they suffer from a number of key challenges - (a) determination of the optimal subset of training frames for generating a suitable background that can be used as a reference frame during coding, (b) incorporating dynamic changes in the background effectively after the initial background frame is generated, (c) managing frequent scene change leading to performance degradation, and (d) optimizing coding quality ratio between an I-frame and other frames under bit rate constraints. In this study we develop a new scene adaptive coding scheme using the NP based technique, capable of solving the current challenges by incorporating a new continuously updating background generation process. Extensive experimental results are also provided to validate the effectiveness of the new scheme.
An efficient video coding technique using a novel non-parametric background model
- Chakraborty, Subrata, Paul, Manoranjan, Murshed, Manzur, Ali, Mortuza
- Authors: Chakraborty, Subrata , Paul, Manoranjan , Murshed, Manzur , Ali, Mortuza
- Date: 2014
- Type: Text , Conference proceedings
- Relation: 2014 IEEE International Conference on Multimedia and Expo Workshops, ICMEW 2014; Chengdu; China; 14th-18th July 2014 p. 1-6
- Full Text:
- Reviewed:
- Description: Video coding technique with a background frame, extracted from mixture of Gaussian (MoG) based background modeling, provides better rate distortion performance by exploiting coding efficiency in uncovered background areas compared to the latest video coding standard. However, it suffers from high computation time, low coding efficiency for dynamic videos, and prior knowledge requirement of video content. In this paper, we present a novel adaptive weighted non-parametric (WNP) background modeling technique and successfully embed it into HEVC video coding standard. Being non-parametric (NP), the proposed technique naturally exhibits superior performance in dynamic background scenarios compared to MoG-based technique without a priori knowledge of video data distribution. In addition, the WNP technique significantly reduces noise-related drawbacks of existing NP techniques to provide better quality video coding with much lower computation time as demonstrated through extensive comparative studies against NP, MoG and HEVC techniques.
- Authors: Chakraborty, Subrata , Paul, Manoranjan , Murshed, Manzur , Ali, Mortuza
- Date: 2014
- Type: Text , Conference proceedings
- Relation: 2014 IEEE International Conference on Multimedia and Expo Workshops, ICMEW 2014; Chengdu; China; 14th-18th July 2014 p. 1-6
- Full Text:
- Reviewed:
- Description: Video coding technique with a background frame, extracted from mixture of Gaussian (MoG) based background modeling, provides better rate distortion performance by exploiting coding efficiency in uncovered background areas compared to the latest video coding standard. However, it suffers from high computation time, low coding efficiency for dynamic videos, and prior knowledge requirement of video content. In this paper, we present a novel adaptive weighted non-parametric (WNP) background modeling technique and successfully embed it into HEVC video coding standard. Being non-parametric (NP), the proposed technique naturally exhibits superior performance in dynamic background scenarios compared to MoG-based technique without a priori knowledge of video data distribution. In addition, the WNP technique significantly reduces noise-related drawbacks of existing NP techniques to provide better quality video coding with much lower computation time as demonstrated through extensive comparative studies against NP, MoG and HEVC techniques.