Crop monitoring by multimodal remote sensing : a review
- Authors: Karmakar, Priyabrata , Teng, Shyh , Murshed, Manzur , Pang, Shaoning , Li, Yanyu , Lin, Hao
- Date: 2024
- Type: Text , Journal article , Review
- Relation: Remote Sensing Applications: Society and Environment Vol. 33, no. (2024), p.
- Full Text:
- Reviewed:
- Description: Effective approaches to achieve food safety and security can prevent catastrophic situations. Therefore, it is required to monitor agricultural crops on a regular basis. This can be easily achieved by capturing data from various remote sensing (RS) devices followed by processing them. Most RS devices are useful in monitoring crops and analysing different stages of plant growth successfully. However, individual devices have some limitations. To overcome this, multimodal remote sensing (MRS) methods have been gradually gaining popularity. In the multimodal approach, data from more than one modality are used together to obtain a better outcome. This is because, different modalities of data when used together can complement each other to achieve the same objective by combining their strengths and reducing their limitations, simultaneously. MRS methods have been found to be particularly useful for crop monitoring as they allow for the integration of data from multiple sources, resulting in a more comprehensive understanding of plant growth and development. By using MRS methods, it is possible to obtain a more accurate and detailed analysis of crop conditions, leading to improved decision-making and ultimately, better crop yields. In this paper, we will explore how MRS methods have been successfully utilised in crop monitoring and how the data obtained from these methods can provide valuable insights into the health and development of plants. © 2023 The Authors
A robust local texture descriptor in the parametric space of the weibull distribution
- Authors: Tania, Sheikh , Karmakar, Gour , Teng, Shyh , Murshed, Manzur
- Date: 2023
- Type: Text , Journal article
- Relation: IEEE Transactions on Multimedia Vol. 25, no. (2023), p. 6053-6066
- Full Text: false
- Reviewed:
- Description: Research in texture feature approximation is still in the embryonic stage because of difficulties in developing a sound theoretical model to express the unique pattern in the intensity-variation of pixels in the neighbourhood of the pixel-of-interest so that it can sufficiently discriminate different textures. Local texture descriptors are widely used in image segmentation as they comprise pixel-wise features. The Weber local descriptor (WLD) with differential excitation and gradient orientation components, inspired by Weber's Law, has been leveraged in the state-of-the-art iterative contraction and merging (ICM) image segmentation technique. However, WLD has inherent drawbacks in the formulation of the components that limit its discriminatory capability. This paper introduces a novel texture descriptor by directly modelling the distribution of intensity-variation in the parametric space of the Weibull distribution using its shape and scale parameters. A unified 'joint scale' texture property is introduced, which can discriminate textures better than the individual parameters while keeping the length of the descriptor shorter. Additionally, the accuracy of WLD's gradient orientation component is improved by using an extended Sobel operator and expressing gradients in -
Comparative analysis of machine and deep learning models for soil properties prediction from hyperspectral visual band
- Authors: Datta, Dristi , Paul, Manoranjan , Murshed, Manzur , Teng, Shyh Wei , Schmidtke, Leigh
- Date: 2023
- Type: Text , Journal article
- Relation: Environments Vol. 10, no. 5 (2023), p. 77
- Full Text:
- Reviewed:
- Description: Estimating various properties of soil, including moisture, carbon, and nitrogen, is crucial for studying their correlation with plant health and food production. However, conventional methods such as oven-drying and chemical analysis are laborious, expensive, and only feasible for a limited land area. With the advent of remote sensing technologies like multi/hyperspectral imaging, it is now possible to predict soil properties non-invasive and cost-effectively for a large expanse of bare land. Recent research shows the possibility of predicting those soil contents from a wide range of hyperspectral data using good prediction algorithms. However, these kinds of hyperspectral sensors are expensive and not widely available. Therefore, this paper investigates different machine and deep learning techniques to predict soil nutrient properties using only the red (R), green (G), and blue (B) bands data to propose a suitable machine/deep learning model that can be used as a rapid soil test. Another objective of this research is to observe and compare the prediction accuracy in three cases i. hyperspectral band ii. full spectrum of the visual band, and iii. three-channel of RGB band and provide a guideline to the user on which spectrum information they should use to predict those soil properties. The outcome of this research helps to develop a mobile application that is easy to use for a quick soil test. This research also explores learning-based algorithms with significant feature combinations and their performance comparisons in predicting soil properties from visual band data. For this, we also explore the impact of dimensional reduction (i.e., principal component analysis) and transformations (i.e., empirical mode decomposition) of features. The results show that the proposed model can comparably predict the soil contents from the three-channel RGB data.
Depth-based sampling and steering constraints for memoryless local planners
- Authors: Nguyen, Binh , Nguyen, Linh , Choudhury, Tanveer , Keogh, Kathleen , Murshed, Manzur
- Date: 2023
- Type: Text , Journal article
- Relation: Journal of Intelligent and Robotic Systems: Theory and Applications Vol. 109, no. 3 (2023), p.
- Full Text:
- Reviewed:
- Description: By utilizing only depth information, the paper introduces a novel two-stage planning approach that enhances computational efficiency and planning performances for memoryless local planners. First, a depth-based sampling technique is proposed to identify and eliminate a specific type of in-collision trajectories among sampled candidates. Specifically, all trajectories that have obscured endpoints are found through querying the depth values and will then be excluded from the sampled set, which can significantly reduce the computational workload required in collision checking. Subsequently, we apply a tailored local planning algorithm that employs a direction cost function and a depth-based steering mechanism to prevent the robot from being trapped in local minima. Our planning algorithm is theoretically proven to be complete in convex obstacle scenarios. To validate the effectiveness of our DEpth-based both Sampling and Steering (DESS) approaches, we conducted experiments in simulated environments where a quadrotor flew through cluttered regions with multiple various-sized obstacles. The experimental results show that DESS significantly reduces computation time in local planning compared to the uniform sampling method, resulting in the planned trajectory with a lower minimized cost. More importantly, our success rates for navigation to different destinations in testing scenarios are improved considerably compared to the fixed-yawing approach. © 2023, The Author(s).
A commonality modeling framework for enhanced video coding leveraging on the cuboidal partitioning based representation of frames
- Authors: Ahmmed, Ashek , Murshed, Manzur , Paul, Manoranjan , Taubman, David
- Date: 2022
- Type: Text , Journal article
- Relation: IEEE Transactions on Multimedia Vol. 24, no. (2022), p. 4446-4457
- Full Text: false
- Reviewed:
- Description: Video coding algorithms attempt to minimize the significant commonality that exists within a video sequence. Each new video coding standard contains tools that can perform this task more efficiently compared to its predecessors. Modern video coding systems are block-based wherein commonality modeling is carried out only from the perspective of the block that need be coded next. In this work, we argue for a commonality modeling approach that can provide a seamless blending between global and local homogeneity information. For this purpose, at first the frame that need be coded, is recursively partitioned into rectangular regions based on the homogeneity information of the entire frame. After that each obtained rectangular region's feature descriptor is taken to be the average value of all the pixels' intensities encompassing the region. In this way, the proposed approach generates a coarse representation of the current frame by minimizing both global and local commonality. This coarse frame is computationally simple and has a compact representation. It attempts to preserve important structural properties of the current frame which can be viewed subjectively as well as from improved rate-distortion performance of a reference scalable HEVC coder that employs the coarse frame as a reference frame for encoding the current frame. © 1999-2012 IEEE.
Bidirectional mapping coupled GAN for generalized zero-shot learning
- Authors: Shermin, Tasfia , Teng, Shyh , Sohel, Ferdous , Murshed, Manzur , Lu, Guojun
- Date: 2022
- Type: Text , Journal article
- Relation: IEEE Transactions on Image Processing Vol. 31, no. (2022), p. 721-733
- Full Text:
- Reviewed:
- Description: Bidirectional mapping-based generalized zero-shot learning (GZSL) methods rely on the quality of synthesized features to recognize seen and unseen data. Therefore, learning a joint distribution of seen-unseen classes and preserving the distinction between seen-unseen classes is crucial for GZSL methods. However, existing methods only learn the underlying distribution of seen data, although unseen class semantics are available in the GZSL problem setting. Most methods neglect retaining seen-unseen classes distinction and use the learned distribution to recognize seen and unseen data. Consequently, they do not perform well. In this work, we utilize the available unseen class semantics alongside seen class semantics and learn joint distribution through a strong visual-semantic coupling. We propose a bidirectional mapping coupled generative adversarial network (BMCoGAN) by extending the concept of the coupled generative adversarial network into a bidirectional mapping model. We further integrate a Wasserstein generative adversarial optimization to supervise the joint distribution learning. We design a loss optimization for retaining distinctive information of seen-unseen classes in the synthesized features and reducing bias towards seen classes, which pushes synthesized seen features towards real seen features and pulls synthesized unseen features away from real seen features. We evaluate BMCoGAN on benchmark datasets and demonstrate its superior performance against contemporary methods. © 1992-2012 IEEE.
Discrete cosine basis oriented motion modeling with cuboidal applicability regions for versatile video coding
- Authors: Ahmmed, Ashek , Hamidouche, Wassim , Lambert, Andrew , Pickering, Mark , Murshed, Manzur
- Date: 2022
- Type: Text , Conference paper
- Relation: 2022 Picture Coding Symposium, PCS 2022, San Jose, Costa Rica, 7-9 December 2022, 2022 Picture Coding Symposium, PCS 2022 - Proceedings p. 337-341
- Full Text: false
- Reviewed:
- Description: The relentless expansion of video based applications is underpinned by video coding technologies. The latest video coding standard i.e. versatile video coding (VVC) can provide superior compression performance than its predecessors. In this regard, motion modeling plays a central role. Experimental results showed that the discrete cosine basis oriented motion model can describe complex motion better than an affine motion model, adopted in the VVC. Hence, in this paper we propose to augment the VVC motion modeling technique with a set of discrete cosine basis oriented motion models and the applicability region of each such motion model is determined by non-overlapping rectangular regions, known as cuboids. Experimental results show a bit rate savings of up to 2.37% is achievable with respect to a VVC reference. © 2022 IEEE.
Dynamic mesh commonality modeling using the cuboidal partitioning
- Authors: Ahmmed, Ashek , Paul, Manoranjan , Murshed, Manzur , Pickering, Mark
- Date: 2022
- Type: Text , Conference paper
- Relation: 2022 IEEE International Conference on Visual Communications and Image Processing, VCIP 2022, Suzhou, China, 13-16 December 2022, 2022 IEEE International Conference on Visual Communications and Image Processing, VCIP 2022
- Full Text: false
- Reviewed:
- Description: For 3D object representation, volumetric contents like meshes and point clouds provide suitable formats. However, a dynamic mesh sequence may require significantly large amount of data because it consists of information that varies with time. Hence, for the facilitation of storage and transmission of such content, efficient compression technologies are required. MPEG has started standardization activities aiming to develop a mesh compression standard that would be able to handle dynamic meshes with time varying connectivity information and time varying attribute maps. The attribute maps are features associated with the mesh surface and stored as 2D images/videos. In this paper, we propose to capture the commonality information in the dynamic mesh attribute maps using the cuboidal partitioning algorithm. This algorithm is capable of modeling both the global and local commonality within an image in a compact and computationally efficient way. Experimental results show that the proposed approach can outperform the anchor HEVC codec, suggested by MPEG to encode such sequences, with a bit rate savings of up to 3.66%. © 2022 IEEE.
Efficient scalable 360-degree video compression scheme using 3d cuboid partitioning
- Authors: Afsana, Fariha , Paul, Manoranjan , Murshed, Manzur , Taubman, David
- Date: 2022
- Type: Text , Conference paper
- Relation: 29th IEEE International Conference on Image Processing, ICIP 2022 p. 996-1000
- Full Text: false
- Reviewed:
- Description: Video coding techniques minimize spatial and temporal redundancies inherent in video sequences based on non-overlapping block-based image partitioning. Due to depending on the information from already encoded neighboring blocks, these algorithms lack efficient techniques to exploit the overall global redundancies. Compared to the traditional block-based coding, the cuboid coding (2D) framework has been proven to be a more effective method of image compression that exploits global redundancy by considering homogeneous pixel correlation within a frame. In this paper, we improved the idea of 2D cuboid coding to exploit both local and global redundancy from a video sequence by adopting a three-dimensional (3D) cuboid partitioning scheme for SHVC compression improvement of 360-degree videos. The proposed method considers a group of successive frames as a 3D cuboid and recursively partitions it into sub-3D cuboids where static information over a selected GOP share the same cuboid and moving regions share new cuboids with better-defined objects. All the 3D cuboids are then encoded to create a coarse representation of the video stream. Experiments indicate that the proposed framework significantly outperforms its relevant benchmarks, notably by 17.18% (average) in BD-Rate reduction and 0.82 dB in BD-PSNR gain with respect to the standard SHVC codec. © 2022 IEEE.
Efficient scalable UHD/360-video coding by exploiting common information with cuboid-based partitioning
- Authors: Afsana, Fariha , Paul, Manoranjan , Murshed, Manzur , Taubman, David
- Date: 2022
- Type: Text , Journal article
- Relation: IEEE Transactions on Circuits and Systems for Video Technology Vol. 32, no. 6 (2022), p. 3961-3977
- Full Text: false
- Reviewed:
- Description: The scalable extension of High Efficiency Video Coding, SHVC can code Ultra High-Definition (UHD) video, including 360-degree video for various devices to serve a single bitstream with different display resolutions and qualities. To improve the SHVC compression efficiency, this paper proposes a novel intra and inter-frame coding scheme by first separating the common/visually important information and then applying cuboid-based variable size block partitioning and coding process for the common/visually important information in the base layer. In cuboid-based partitioning a video frame is partitioned into arbitrary shaped rectangular regions, known as cuboids, based on the distribution of relatively homogeneous pixel values. As the cuboid adopts a variable block partitioning based on the homogeneity of the data value, the partitioned blocks have better alignment with the object boundary. Moreover, in the cuboid coding process, only the partitioning tree information and a single value for each block need to be coded which takes lower number of bits and computational time compared to the traditional SHVC base layer. To verify the performance of the proposed method we embedded the proposed scheme as a base layer into the standard SHVC reference software and used several popular UHD/360-degree videos. The experimental results indicate that the proposed scalable coding strategy achieves an average of 14.04% BD-Rate reduction and 0.61 dB BD-PSNR gain for UHD/360-video compared to the operation points provided by an SHVC conforming encoder. © 1991-2012 IEEE.
Human pose based video compression via forward-referencing using deep learning
- Authors: Rajin, S.M. Ataul Karim , Murshed, Manzur , Paul, Manoranjan , Teng, Shyh , Ma, Jiangang
- Date: 2022
- Type: Text , Conference paper
- Relation: 2022 IEEE International Conference on Visual Communications and Image Processing, VCIP 2022, Suzhou, China,13-16 December 2022, 2022 IEEE International Conference on Visual Communications and Image Processing, VCIP 2022
- Full Text: false
- Reviewed:
- Description: To exploit high temporal correlations in video frames of the same scene, the current frame is predicted from the already-encoded reference frames using block-based motion estimation and compensation techniques. While this approach can efficiently exploit the translation motion of the moving objects, it is susceptible to other types of affine motion and object occlusion/deocclusion. Recently, deep learning has been used to model the high-level structure of human pose in specific actions from short videos and then generate virtual frames in future time by predicting the pose using a generative adversarial network (GAN). Therefore, modelling the high-level structure of human pose is able to exploit semantic correlation by predicting human actions and determining its trajectory. Video surveillance applications will benefit as stored 'big' surveillance data can be compressed by estimating human pose trajectories and generating future frames through semantic correlation. This paper explores a new way of video coding by modelling human pose from the already-encoded frames and using the generated frame at the current time as an additional forward-referencing frame. It is expected that the proposed approach can overcome the limitations of the traditional backward-referencing frames by predicting the blocks containing the moving objects with lower residuals. Our experimental results show that the proposed approach can achieve on average up to 2.83 dB PSNR gain and 25.93% bitrate savings for high motion video sequences compared to standard video coding. © 2022 IEEE.
Integrated generalized zero-shot learning for fine-grained classification
- Authors: Shermin, Tasfia , Teng, Shyh , Sohel, Ferdous , Murshed, Manzur , Lu, Guojun
- Date: 2022
- Type: Text , Journal article
- Relation: Pattern Recognition Vol. 122, no. (2022), p.
- Full Text:
- Reviewed:
- Description: Embedding learning (EL) and feature synthesizing (FS) are two of the popular categories of fine-grained GZSL methods. EL or FS using global features cannot discriminate fine details in the absence of local features. On the other hand, EL or FS methods exploiting local features either neglect direct attribute guidance or global information. Consequently, neither method performs well. In this paper, we propose to explore global and direct attribute-supervised local visual features for both EL and FS categories in an integrated manner for fine-grained GZSL. The proposed integrated network has an EL sub-network and a FS sub-network. Consequently, the proposed integrated network can be tested in two ways. We propose a novel two-step dense attention mechanism to discover attribute-guided local visual features. We introduce new mutual learning between the sub-networks to exploit mutually beneficial information for optimization. Moreover, we propose to compute source-target class similarity based on mutual information and transfer-learn the target classes to reduce bias towards the source domain during testing. We demonstrate that our proposed method outperforms contemporary methods on benchmark datasets. © 2021 Elsevier Ltd
Multi-objective dynamic virtual machine consolidation algorithm for cloud data centers with highly energy proportional servers and heterogeneous workload
- Authors: Khan, Md Anit , Paplinski, Andrew , Khan, Abdul , Murshed, Manzur , Buyya, Rajkumar
- Date: 2022
- Type: Text , Book chapter
- Relation: New Frontiers in Cloud Computing and Internet of Things Chapter 3 p. 69-106
- Full Text: false
- Reviewed:
- Description: Present Dynamic VM Consolidation (DVMC) algorithms assume that optimal energy efficiency can be achieved via maximum load on Physical Machines (PMs). Such assumption has become invalid with the advent of the highly energy proportional PMs. Additionally, these algorithms consider only varying resource demand, ignoring dissimilarity of workload finishing time, aka the VM Release Time (VMRT), whereas both aspects are strongly associated with energy consumption. Consequently, traditional algorithms fail to proffer optimal performance under real Cloud scenarios. Although minimization of VM migration brings massive benefit for Cloud Data Center (CDC), it is complete opposite of what is needed to minimize energy consumption through DVMC. As such, our proposed multi-objective Stochastic Release Time aware DVMC (SRTDVMC) algorithm is unique in addressing concomitant minimization of energy consumption and VM migration in the presence of state-of-the-art PMs and heterogeneous workloads. © 2022, The Author(s), under exclusive license to Springer Nature Switzerland AG.
Soil moisture, organic carbon, and nitrogen content prediction with hyperspectral data using regression models
- Authors: Datta, Dristi , Paul, Manoranjan , Murshed, Manzur , Teng, Shyh Wei , Schmidtke, Leigh
- Date: 2022
- Type: Text , Journal article
- Relation: Sensors (Basel, Switzerland) Vol. 22, no. 20 (2022), p.
- Full Text:
- Reviewed:
- Description: Soil moisture, soil organic carbon, and nitrogen content prediction are considered significant fields of study as they are directly related to plant health and food production. Direct estimation of these soil properties with traditional methods, for example, the oven-drying technique and chemical analysis, is a time and resource-consuming approach and can predict only smaller areas. With the significant development of remote sensing and hyperspectral (HS) imaging technologies, soil moisture, carbon, and nitrogen can be estimated over vast areas. This paper presents a generalized approach to predicting three different essential soil contents using a comprehensive study of various machine learning (ML) models by considering the dimensional reduction in feature spaces. In this study, we have used three popular benchmark HS datasets captured in Germany and Sweden. The efficacy of different ML algorithms is evaluated to predict soil content, and significant improvement is obtained when a specific range of bands is selected. The performance of ML models is further improved by applying principal component analysis (PCA), a dimensional reduction method that works with an unsupervised learning method. The effect of soil temperature on soil moisture prediction is evaluated in this study, and the results show that when the soil temperature is considered with the HS band, the soil moisture prediction accuracy does not improve. However, the combined effect of band selection and feature transformation using PCA significantly enhances the prediction accuracy for soil moisture, carbon, and nitrogen content. This study represents a comprehensive analysis of a wide range of established ML regression models using data preprocessing, effective band selection, and data dimension reduction and attempt to understand which feature combinations provide the best accuracy. The outcomes of several ML models are verified with validation techniques and the best- and worst-case scenarios in terms of soil content are noted. The proposed approach outperforms existing estimation techniques.
Adversarial network with multiple classifiers for open set domain adaptation
- Authors: Shermin, Tasfia , Lu, Guojun , Teng, Shyh , Murshed, Manzur , Sohel, Ferdous
- Date: 2021
- Type: Text , Journal article
- Relation: IEEE Transactions on Multimedia Vol. 23, no. (2021), p. 2732-2744
- Full Text:
- Reviewed:
- Description: Domain adaptation aims to transfer knowledge from a domain with adequate labeled samples to a domain with scarce labeled samples. Prior research has introduced various open set domain adaptation settings in the literature to extend the applications of domain adaptation methods in real-world scenarios. This paper focuses on the type of open set domain adaptation setting where the target domain has both private ('unknown classes') label space and the shared ('known classes') label space. However, the source domain only has the 'known classes' label space. Prevalent distribution-matching domain adaptation methods are inadequate in such a setting that demands adaptation from a smaller source domain to a larger and diverse target domain with more classes. For addressing this specific open set domain adaptation setting, prior research introduces a domain adversarial model that uses a fixed threshold for distinguishing known from unknown target samples and lacks at handling negative transfers. We extend their adversarial model and propose a novel adversarial domain adaptation model with multiple auxiliary classifiers. The proposed multi-classifier structure introduces a weighting module that evaluates distinctive domain characteristics for assigning the target samples with weights which are more representative to whether they are likely to belong to the known and unknown classes to encourage positive transfers during adversarial training and simultaneously reduces the domain gap between the shared classes of the source and target domains. A thorough experimental investigation shows that our proposed method outperforms existing domain adaptation methods on a number of domain adaptation datasets. © 1999-2012 IEEE.
Detection of Malleefowl Mounds from Point Cloud Data
- Authors: Parvin, Nahida , Awrangjeb, Mohammad , Irvin, Marc , Florentine, Singarayer , Murshed, Manzur , Lu, Guojun
- Date: 2021
- Type: Text , Conference paper
- Relation: 2021 International Conference on Digital Image Computing: Techniques and Applications, DICTA 2021, Gold Coast, 29 November to 1 December 2021
- Full Text: false
- Reviewed:
- Description: Airborne light detection and ranging (LiDAR) data have become cost and time-efficient means for estimating the size of timid fauna populations through the identification of artefacts that evidence their occurrence in a large, hostile geographic area. The unobtrusive detection method helps conservation managers to assess the stability of a population and to design appropriate conservation programs. Here we propose a mound (nest) detection method for Australia's native iconic bird, the Malleefowl, from point cloud data, which can be manipulated to act as a surrogate for population data. Existing detection methods are largely through manual observations, and are therefore not efficient for covering large and remote areas. The proposed mound detection method can identify mound feature based on height and intensity values provided by the point cloud data. Each candidate mound point is initially selected by applying a height threshold utilising the classified ground points and their corresponding digital elevation model (DEM). Then, another threshold based on intensity range derived from ground truth mound area analysis is applied on the extracted initial mound points to find the final candidate mound points. These extracted points are then used to generate a binary mask where the potential mound points are found sparse. To connect those points, a morphological filter is applied on the binary image and found the mound separated from other remaining non-mound objects. To obtain the mound from other non-mound objects, a morphological cleaning operation and a connected component analysis are carried out on the mask. The non-mound objects are removed from the mask utilising the area property of mound derived from the empirical analysis of ground-truth observations. Finally, the effectiveness of the proposed technique is calculated based on ground truth. Although the mound shapes and structures are highly variable in nature, our height and intensity-based mound point extraction method detected 55 % of the ground-truthed mounds. © 2021 IEEE.
Dynamic point cloud compression using a cuboid oriented discrete cosine based motion model
- Authors: Ahmmed, Ashek , Paul, Manoranjan , Murshed, Manzur , Taubman, David
- Date: 2021
- Type: Text , Conference paper
- Relation: 2021 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2021 Vol. 2021-June, p. 1935-1939
- Full Text: false
- Reviewed:
- Description: Immersive media representation format based on point clouds has underpinned significant opportunities for extended reality applications. Point cloud in its uncompressed format require very high data rate for storage and transmission. The video based point cloud compression technique projects a dynamic point cloud into geometry and texture video sequences. The projected texture video is then coded using modern video coding standard like HEVC. Since the properties of projected texture video frames are different from traditional video frames, HEVC-based commonality modeling can be inefficient. An improved commonality modeling technique is proposed that employs discrete cosine basis oriented motion models and the domains of such models are approximated by homogeneous regions called cuboids. Experimental results show that the proposed commonality modeling technique can yield savings in bit rate of up to 4.17%. ©2021 IEEE
Dynamic point cloud geometry compression using cuboid based commonality modelling framework
- Authors: Ahmmed, Ashek , Paul, Manoranjan , Murshed, Manzur , Taubman, David
- Date: 2021
- Type: Text , Conference paper
- Relation: 2021 IEEE International Conference on Image Processing, ICIP 2021, Anchorage, USA, 19-21 September 2021, Proceedings - International Conference on Image Processing, ICIP Vol. 2021-September, p. 2159-2163
- Full Text: false
- Reviewed:
- Description: Point cloud in its uncompressed format require very high data rate for storage and transmission. The video based point cloud compression (V-PCC) technique projects a dynamic point cloud into geometry and texture video sequences. The projected geometry and texture video frames are then encoded using modern video coding standard like HEVC. However, HEVC encoder is unable to exploit the global commonality that exists within a geometry frame and between successive geometry frames to a greater extent. This is because in HEVC, the current frame partitioning starts from a rigid 64 × 64 pixels level without considering the structure of the scene need be coded. In this paper, an improved commonality modeling framework is proposed, by leveraging on cuboid-based frame partitioning, to encode point cloud geometry frames. The associated frame-partitioning scheme is based on statistical properties of the current geometry frame and therefore yields a flexible block partitioning structure composed of cuboids. Additionally, the proposed commonality modeling approach is computationally efficient and has a compact representation. Experimental results show that if the V-PCC reference encoder is augmented by the proposed commonality modeling technique, a bit rate savings of 2.71% and 4.25% are achieved for full body and upper body of human point clouds’ geometry sequences respectively. © 2021 IEEE.
Efficient high-resolution video compression scheme using background and foreground layers
- Authors: Afsana, Fariha , Paul, Manoranjan , Murshed, Manzur , Taubman, David
- Date: 2021
- Type: Text , Journal article
- Relation: IEEE Access Vol. 9, no. (2021), p. 157411-157421
- Full Text:
- Reviewed:
- Description: Video coding using dynamic background frame achieves better compression compared to the traditional techniques by encoding background and foreground separately. This process reduces coding bits for the overall frame significantly; however, encoding background still requires many bits that can be compressed further for achieving better coding efficiency. The cuboid coding framework has been proven to be one of the most effective methods of image compression which exploits homogeneous pixel correlation within a frame and has better alignment with object boundary compared to traditional block-based coding. In a video sequence, the cuboid-based frame partitioning varies with the changes of the foreground. However, since the background remains static for a group of pictures, the cuboid coding exploits better spatial pixel homogeneity. In this work, the impact of cuboid coding on the background frame for high-resolution videos (Ultra-High-Definition (UHD) and 360-degree videos) is investigated using the multilayer framework of SHVC. After the cuboid partitioning, the method of coarse frame generation has been improved with a novel idea by keeping human-visual sensitive information. Unlike the traditional SHVC scheme, in the proposed method, cuboid coded background and the foreground are encoded in separate layers in an implicit manner. Simulation results show that the proposed video coding method achieves an average BD-Rate reduction of 26.69% and BD-PSNR gain of 1.51 dB against SHVC with significant encoding time reduction for both UHD and 360 videos. It also achieves an average of 13.88% BD-Rate reduction and 0.78 dB BD-PSNR gain compared to the existing relevant method proposed by X. Hoang Van. © 2013 IEEE.
Human-machine collaborative video coding through cuboidal partitioning
- Authors: Ahmmed, Ashek , Paul, Manoranjan , Murshed, Manzur , Taubman, David
- Date: 2021
- Type: Text , Conference paper
- Relation: 2021 IEEE International Conference on Image Processing, ICIP 2021, Anchorage, USA 19-22 September 2021, Proceedings - International Conference on Image Processing, ICIP Vol. 2021-September, p. 2074-2078
- Full Text:
- Reviewed:
- Description: Video coding algorithms encode and decode an entire video frame while feature coding techniques only preserve and communicate the most critical information needed for a given application. This is because video coding targets human perception, while feature coding aims for machine vision tasks. Recently, attempts are being made to bridge the gap between these two domains. In this work, we propose a video coding framework by leveraging on to the commonality that exists between human vision and machine vision applications using cuboids. This is because cuboids, estimated rectangular regions over a video frame, are computationally efficient, has a compact representation and object centric. Such properties are already shown to add value to traditional video coding systems. Herein cuboidal feature descriptors are extracted from the current frame and then employed for accomplishing a machine vision task in the form of object detection. Experimental results show that a trained classifier yields superior average precision when equipped with cuboidal features oriented representation of the current test frame. Additionally, this representation costs 7% less in bit rate if the captured frames are need be communicated to a receiver. © 2021 IEEE.