Multi-resolution intra-predictive coding of 3D point cloud attributes

Abstract

We propose an intra frame predictive strategy for compression of 3D point cloud attributes. Our approach is integrated with the region adaptive graph Fourier transform (RAGFT), a multi-resolution transform formed by a composition of localized block transforms, which produces a set of low pass (approximation) and high pass (detail) coefficients at multiple resolutions. Since the transform operations are spatially localized, RAGFT coefficients at a given resolution may still be correlated. To exploit this phenomenon, we propose an intra-prediction strategy, in which decoded approximation coefficients are used to predict uncoded detail coefficients. The prediction residuals are then quantized and entropy coded. For the 8i dataset, we obtain gains up to $0.5$ db as compared to intra predicted point cloud compresion based on the region adaptive Haar transform (RAHT).

Index Terms— 3D point clouds, intra prediction, multiresolution transform, graph fourier transform

1 Introduction

3D point clouds (3DPC) have become the preferred representation of 3D scenes, people and objects [1, 2, 3]. They consist of a list of points coordinates in 3D space along with color attributes. Recent advancements in real time 3D capture, along with potential applications to entertainment and immersive communications, have prompted research on 3DPC compression [1, 2].

This paper focuses on 3DPC attribute compression. Earlier approaches for compression of 3DPC attributes were based on transform coding techniques, that is, transformation followed by quantization and entropy coding, similar to modern image codecs. For 3DPCs, a popular approach is based on the region adaptive hierarchical (or Haar) transform (RAHT) [4, 5]. In image and video coding, intra and inter prediction are often combined with transform coding. Predictive methods for 3DPC compression have only recently become popular, which may be explained by the fact that good spatial and temporal predictors are harder to obtain for 3DPCs, since these i) represent complex surface and non-surface geometries, ii) have spatial frame to frame irregularity, and iii) lack temporal consistency. There has been substantial recent work to apply transform coding to inter frame (temporal) prediction residuals [3, 6, 7, 8, 9]. However, intra frame prediction is less explored. In [10, 11], block based intra-prediction similar to video was considered. More recently, MPEG adopted an intra prediction strategy for the RAHT (I-RAHT) [12], which uses a multi-resolution prediction, instead of traditional directional block based prediction.

In I-RAHT, low pass decoded RAHT coefficients are used to predict high pass RAHT coefficients. To achieve this, a higher resolution predictor point cloud is constructed from lower resolution decoded coefficients. Each predictor attribute in the high resolution point cloud is obtained as a linear combination of the nearest attributes in the lower resolution point cloud. Then the detail coefficients of this predictor signal are used to predict the target detail coefficients. While this approach to construct a higher resolution point cloud is suitable for the RAHT, since it is formed by a composition of $2\times 2$ orthogonal transforms, it fails to take into account the more complex geometry of the higher resolution point cloud. This becomes even more important when considering larger block transforms with more points used by the region adaptive graph Fourier transform (RAGFT) [13], as shown in Figure 1.

In this paper we propose an intra-predictive coding framework for RAGFT (I-RAGFT). We use the same multi-resolution prediction strategy used in I-RAHT, where decoded approximation coefficients are used to predict and code detail coefficients, as depicted in Figure 2. However, different from the prediction algorithm used by I-RAHT, our proposed predictors exploits the higher resolution point cloud geometry. We start by projecting the low resolution signal onto the higher resolution geometry by zero padding and applying a one level inverse RAGFT. The resulting interpolated signal is piece-wise constant, as depicted by the colored voxels in Figure 1(b). Then a smoothing graph filter [14, 15, 16, 17] is applied using a graph constructed on top of the higher resolution point cloud. While our approach is new for 3DPCs, similar ideas have been used to improve image coding by predicting wavelet coefficients with learning based super resolution algorithms [18].

The difference between our proposed approach and I-RAHT is illustrated in Figure 1. I-RAHT uses a block-level predictor, since the approximation coefficients of the nearest neighboring blocks are used (Figure 1(a)), while our approach uses a point-level predictor, where fine resolution point values are interpolated, then filtered (across block boundaries), so that only nearby points (instead of nearby blocks) are used to compute the predictor. We show through compression experiments that the proposed approach can outperform I-RAHT, when using uniform quantization and adaptive RLGR entropy coding [19].

The rest of the paper is organized as follows. In Section 2 we introduce the multi-resolution intra-prediction coding framework. The proposed predictors for the RAGFT are described in 3. We show numerical experiments and conclusions in Sections 4 and 5, respectively.

2 Multi resolution predictive coding

Consider a point cloud with point coordinates stored in the $N\times 3$ matrix ${\bf V}=[{\bf v}_{i}]$ , and attributes ${\bf a}=[a_{i}]$ . We will assume the attributes are processed with a multi-resolution transform, which at each resolution level takes an attribute vector ${\bf a}_{\ell+1}$ , and produces approximation and detail coefficients at resolution $\ell$

\begin{bmatrix}{\bf a}_{\ell}\\ {\bf d}_{\ell}\end{bmatrix}={\bf T}_{\ell}{\bf a}_{\ell+1}.

(1)

The transform matrix ${\bf T}_{\ell}$ is an orthonormal matrix, thus ${\bf T}_{\ell}^{-1}={\bf T}_{\ell}^{\top}$ . We assume that the original 3DPC attributes are stored at the highest resolution, $L$ , so that ${\bf a}_{L}={\bf a}$ . After applying (1) $L$ times, we obtain transform coefficients

\begin{bmatrix}{\bf a}_{0}^{\top},{\bf d}_{0}^{\top},{\bf d}_{1}^{\top},\cdots,{\bf d}_{L-1}^{\top}\end{bmatrix}^{\top}.

(2)

Several orthonormal transforms for 3DPC attributes can be described this way, including the block based graph Fourier transform [20], RAHT [4] and RAGFT [13]. Since RAGFT is a composition of spatially localized block transforms, there may be additional redundancy between transformed coefficients, similar to what is observed for the RAHT [12]. While previous approaches code the coefficients in (2) independently ignoring their dependencies across blocks, in this work we exploit them to improve coding efficiency.

Denote by $\widehat{\hbox{\boldmath$\Phi$}}_{0}={\cal Q}({\bf a}_{o})$ , the quantized low pass coefficients, where ${\cal Q}(\cdot)$ is a quantization operator, and denote by $\hat{{\bf a}}_{0}={\cal Q}^{-1}(\widehat{\hbox{\boldmath$\Phi$}}_{0})$ the corresponding decoded coefficients. Now we define several quantities recursively. The decoded approximation coefficient at resolution $\ell$ is given by $\hat{{\bf a}}_{\ell}$ , while the corresponding detail coefficient at the same resolution is ${\bf d}_{\ell}$ . We will assume there is function ${\cal P}_{\ell}(\hat{{\bf a}}_{\ell})=\tilde{{\bf d}}_{\ell}$ , that predicts detail coefficients from approximation coefficients. This predictor will be explained in detail in the next section. Using ${\cal P}_{\ell}$ we compute a residual and quantize it obtaining

\widehat{\hbox{\boldmath$\Delta$}}_{\ell}={\cal Q}({\bf d}_{\ell}-\tilde{{\bf d}}_{\ell}).

(3)

If the predictor is good enough, coding $\widehat{\hbox{\boldmath$\Delta$}}_{\ell}$ is more efficient than coding the transform coefficients ${\cal Q}({\bf d}_{\ell})$ directly. The decoded details coefficients are given by

\hat{{\bf d}}_{\ell}=\tilde{{\bf d}}_{\ell}+{\cal Q}^{-1}(\widehat{\hbox{\boldmath$\Delta$}}_{\ell}).

(4)

The decoded approximation and detail coefficients at resolution $\ell$ are used to obtain approximation coefficients at resolution $\ell+1$ with

\hat{{\bf a}}_{\ell+1}={\bf T}^{-1}_{\ell}\begin{bmatrix}\hat{{\bf a}}_{\ell}\\ \hat{{\bf d}}_{\ell}\end{bmatrix}.

(5)

The quantities $\widehat{\hbox{\boldmath$\Delta$}}_{\ell}$ can be computed recursively as depicted in Figure 2, starting from the lowest resolution coefficients ${\bf a}_{0}$ . A typical transform coding strategy would quantize and entropy encode (2). In this work we encode

\begin{bmatrix}\widehat{\hbox{\boldmath$\Phi$}}_{0}^{\top},\widehat{\hbox{\boldmath$\Delta$}}_{0}^{\top},\widehat{\hbox{\boldmath$\Delta$}}_{1}^{\top},\cdots,\widehat{\hbox{\boldmath$\Delta$}}_{L-1}^{\top}\end{bmatrix}^{\top}.

(6)

3 Multi resolution prediction

In this section we describe the function ${\cal P}_{\ell}(\hat{{\bf a}}_{\ell})=\tilde{{\bf d}_{\ell}}$ , used to predict the high pass coefficients ${\bf d}_{\ell}$ .

3.1 Graph representation of point clouds

While forming transform coefficients (2), many transforms [4, 13], either explicitely or implicitely, produce sets of point coordinates at various resolutions (e.g., by down-sampling), thus for each resolution $\ell$ , we have a point cloud $({\bf V}_{\ell},{\bf a}_{\ell})$ , where ${\bf V}_{L}={\bf V}$ and ${\bf a}_{L}={\bf a}$ .

For each of these point clouds consider a graph ${\cal G}_{\ell}=({\bf V}_{\ell},{\bf W}_{\ell},{\cal E}_{\ell})$ . The matrix ${\bf W}_{\ell}$ is the adjacency matrix and ${\bf D}_{\ell}=\operatorname*{diag}(\sum({\bf W}_{\ell}))$ is the degree matrix. The graph has edge set ${\cal E}_{\ell}$ , where $ij\in{\cal E}_{\ell}$ if point ${\bf v}_{i,\ell}$ is “near” to point ${\bf v}_{j,\ell}$ . Edge weights are given by

({\bf W}_{\ell})_{ij}=\frac{1}{\|{\bf v}_{i,\ell}-{\bf v}_{j,\ell}\|}.

(7)

3.2 Constructing the predictor

The first step in constructing our predictor is interpolation of the low resolution point cloud $({\bf V}_{\ell},\hat{{\bf a}}_{\ell})$ by zero padding, and inverse transformation, leading to the attribute signal

{\bf b}_{\ell+1}={\bf T}_{\ell}^{-1}\begin{bmatrix}\hat{{\bf a}}_{\ell}\\ \mathbf{0}\end{bmatrix},

(8)

and higher resolution point cloud $({\bf V}_{\ell+1},{\bf b}_{\ell+1})$ . We propose applying a smoothing (low pass) filter to the point cloud $({\bf V}_{\ell+1},{\bf b}_{\ell+1})$ , and then applying the transform ${\bf T}_{\ell}$ . The predictor block from Figure 2 is depicted in Figure 3, with the caveat that $\tilde{{\bf a}}_{\ell}$ is ignored.

3.3 Graph filtering of RAGFT coefficients

In the RAGFT [13], each transform coefficient has an importance weight that depends on the size (number of points) of the region of the point cloud they represent. The $i$ th point of the point cloud of resolution $\ell$ is denoted by ${\bf v}_{i,\ell}$ . The point ${\bf v}_{i,\ell}$ has a set of children at resolution $\ell+1$ denoted by ${\cal C}_{i,\ell}$ . The index set of the children is denoted by ${\cal I}_{i,\ell}=\{j:{\bf v}_{j,\ell+1}\in{\cal C}_{i,\ell}\}$ . We denote the weight of point ${\bf v}_{i,\ell}$ by $q_{i,\ell}$ . Based on this parent-child relation, the weights can be computed recursively as

q_{i,\ell}=\sum_{j\in{\cal I}_{i,\ell}}q_{j,\ell+1},

(9)

and the weights at full resolution are $q_{i,L}=1$ for all $i$ . For the RAGFT, the interpolation equation (8) has a closed form, thus the $j$ th entry of ${\bf b}_{\ell+1}$ is equal to

b_{j,\ell+1}=\sqrt{\frac{q_{j,\ell+1}}{q_{i,\ell}}}\hat{a}_{i,\ell},

(10)

where point ${\bf v}_{i,\ell}$ is the parent of ${\bf v}_{j,\ell+1}$ . Note that if we define the diagonal matrix of weights by ${\bf Q}_{\ell+1}$ , with $jj$ entry equal to $q_{j,\ell+1}$ , then the interpolated signal is

{\bf b}_{\ell+1}={\bf Q}^{1/2}_{\ell+1}\left(\sum_{i}\frac{\hat{a}_{i,\ell}}{\sqrt{q_{i,\ell}}}\mathbf{1}_{{\cal I}_{i,\ell}}\right),

(11)

where $\mathbf{1}_{{\cal I}_{i,\ell}}$ is the indicator of the set ${\cal I}_{i,\ell}$ . Thus after normalization by the square root of the point weights, the interpolated signal is piece-wise constant. In fact, this signal is constant within cube shaped regions, because the RAGFT is is a composition of localized “block” transforms. Figure 1(b) depicts this piece-wise constant signal. Given that ${\bf b}_{\ell+1}$ has this form, we take into account the normalization matrix ${\bf Q}_{\ell+1}$ , and the piece-wise constant structure, when designing our smoothing operator. The first step is to normalize the entries of ${\bf b}_{\ell+1}$ by their point weights using ${\bf Q}^{-1/2}_{\ell+1}$ . The resulting piece-wise constant signal is filtered to smooth out boundaries between regions. Finally, we scale back each attribute using the matrix ${\bf Q}^{1/2}_{\ell+1}$ . The proposed filtering operation has the form

\tilde{{\bf b}}_{\ell+1}={\bf Q}^{1/2}_{\ell+1}{\bf D}^{-1}_{\ell+1}{\bf W}_{\ell+1}{\bf Q}^{-1/2}_{\ell+1}{\bf b}_{\ell+1}.

(12)

At resolution $\ell$ , the adjacency matrix ${\bf W}_{\ell}$ is constructed using $k$ nearest neighbors on the point cloud geometry ${\bf V}_{\ell}$ . The signal $\tilde{{\bf b}}_{\ell+1}$ is transformed using (1), resulting in $\tilde{{\bf a}}_{\ell}$ and $\tilde{{\bf d}}_{\ell}$ .

3.4 Complexity

The proposed intra predicted RAGFT can be computed by performing a forward and inverse RAGFT, along with a graph construction and a graph filtering operation per resolution level. For a point cloud with $N$ points the forward and inverse RAGFT have complexity ${\cal O}(N)$ . If the octree [21, 22] has to be computed, overall complexity increases to ${\cal O}(N\log(N))$ . If the number of points at resolution $\ell$ is equal to $N_{\ell}$ , computation of ${\bf b}_{\ell}$ using (11) takes ${\cal O}(N_{\ell})$ time. Graph construction with $k$ nearest neighbors requires ${\cal O}(kN_{\ell}\log(N_{\ell}))$ operations, while $\tilde{{\bf b}}_{\ell}$ can be computed using sparse matrix vector products in ${\cal O}(kN_{\ell})$ time. If $k$ is ${\cal O}(1)$ , then prediction operations at all resolutions have complexity ${\cal O}(N\log(N))$ , resulting in an overall ${\cal O}(N\log(N))$ complexity of the intra predicted RAGFT.

4 Numerical results

We integrate intra prediction with the RAGFT with $2\times 2\times 2$ blocks at all resolution levels, thus each localized block transform processes at most $8$ points. The proposed predictors are implemented using graph filters, that use $K$ nearest neighbor graphs, with $K=7$ neighbors per point. We denote this approach by “I-RAGFT”. We also implement predictors similar to those used by the G-PCC implementation of I-RAHT using KNN from the lower resolution graph (see Fig. 1(a)). For this approach we set $K=5$ , found after optimizing for best performance. This approach is denoted by “I-RAGFT LowRes”. We also compare against the RAGFT with high resolution block size equal to $16$ , which achieves highest coding performance compared to RAHT, and the RAGFT with high resolution block size equal to $2$ , which provides a mild improvement over RAHT (see [13] for details). For RAGFT and intra RAGFT, we uniformly quantize coefficients and entropy code them using adaptive run-length Golomb-Rice (RLGR) algorithm [19].

The state of the art in non-video-based compression of point cloud attributes, called G-PCC [5] uses I-RAHT. Several techniques are implemented in G-PCC to improve coding efficiency beyond that obtained through intra prediction. Some of them include jointly encoding YUV coefficients that are equal to zero, and adaptive quantization schemes of AC coefficients. These techniques could also be applied to the RAGFT but their implementation goes beyond the scope of this paper and we leave them for future work. In order to obtain a performance comparison between our method and the current state-of-the-art, under the same conditions, we changed the source code of G-PCC modifying both the quantizer and the entropy encoder schemes. The adaptive quantizer was replaced with a uniform quantizer and the entropy encoder was modified to an adaptive RLGR algorithm. Moreover, we encode YUV coefficients independently. In Figure 4 we report rate distortion curves for the “longdress” and “loot” point clouds of the “8iVFBv2” dataset [23].

Incorporating intra prediction into the RAGFT improves coding performance significantly. The difference between “RAGFT $b=2$ ” and “I-RAGFT LowRes” is about $1.5$ db, while the gain obtained by using better predictors (“I-RAGFT Proposed”) can reach up to $2.5$ db. In [13], the RAGFT implemented with larger block sizes ( $b_{L}=16$ ) led to the best results, since spatial redundancy can be removed more efficiently using graph transforms on larger blocks. Our results show that by combining low resolution intra prediction and RAGFT with small block size (“I-RAGFT $b=2$ ”), we can outperform the best RAGFT at low bitrates. However this approach is still inferior to I-RAHT. The proposed predictor based on the higher resolution point cloud, combined with the RAGFT achieves the best rate distortion performance among all methods considered, and outperforms I-RAHT by up to $0.5$ db.

In Figure 5 we compare decoded attributes obtained after compression using I-RAGFT, I-RAHT and RAGFT ( $b_{L}=16$ ), at the same rate of $0.493bpp$ . Transform coding with the RAGFT produces strong blocking artifacts, similar to those associated to JPEG encoding. Both intra coding approaches have improved visual quality with respect to RAGFT, while also using smaller transforms, thus avoiding blocking artifacts. I-RAHT produces an over smoothed reconstruction, which can be accredited to the less localized filtering operations. Since the I-RAGFT uses higher resolution predictors, image details are better preserved, however it suffers from other localized artifacts.

5 Conclusion

We studied the use of intra prediction for compression of 3D point cloud attributes with the RAGFT. As with the RAHT, we showed that it is possible to remove redundancy between RAGFT coefficients at different resolutions. In the RAHT, a high resolution point cloud is predicted from a lower resolution point cloud. While this approach is efficient for predicting RAHT coefficients, it is less effective for the RAGFT. This is because the RAGFT uses larger block transforms, that makes prediction more challenging. To overcome this issue, we proposed a different prediction based on interpolation and graph signal filtering, that takes into account the higher resolution geometry, thus adapting better to the RAGFT. Compression experiments show the proposed approach outperforms intra predictive coding of RAHT coefficients at all rates by up to $0.5$ db.

References

[1] Sebastian Schwarz, Marius Preda, Vittorio Baroncini, Madhukar Budagavi, Pablo Cesar, Philip A Chou, Robert A Cohen, Maja Krivokuća, Sébastien Lasserre, Zhu Li, et al., “Emerging MPEG standards for point cloud compression,” IEEE Journal on Emerging and Selected Topics in Circuits and Systems, vol. 9, no. 1, pp. 133–148, 2018.
[2] D Graziosi, O Nakagami, S Kuma, A Zaghetto, T Suzuki, and A Tabatabai, “An overview of ongoing point cloud compression standardization activities: Video-based (V-PCC) and geometry-based (G-PCC),” APSIPA Transactions on Signal and Information Processing, vol. 9, 2020.
[3] Eduardo Pavez, Philip A Chou, Ricardo L De Queiroz, and Antonio Ortega, “Dynamic polygon clouds: representation and compression for VR/AR,” APSIPA Transactions on Signal and Information Processing, vol. 7, 2018.
[4] Ricardo L De Queiroz and Philip A Chou, “Compression of 3D point clouds using a region-adaptive hierarchical transform,” IEEE Transactions on Image Processing, vol. 25, no. 8, pp. 3947–3956, 2016.
[5] 3DG, “G-PCC codec description,” Tech. Rep. W19620, ISO/IEC JTC1/SC29/WG11 MPEG, 2020.
[6] Yiqun Xu, Wei Hu, Shanshe Wang, Xinfeng Zhang, Shiqi Wang, Siwei Ma, Zongming Guo, and Wen Gao, “Predictive generalized graph Fourier transform for attribute compression of dynamic point clouds,” IEEE Transactions on Circuits and Systems for Video Technology, 2020.
[7] André L Souto and Ricardo L de Queiroz, “On predictive RAHT for dynamic point cloud coding,” in 2020 IEEE International Conference on Image Processing (ICIP). IEEE, 2020, pp. 2701–2705.
[8] Ricardo L de Queiroz and Philip A Chou, “Motion-compensated compression of dynamic voxelized point clouds,” IEEE Transactions on Image Processing, vol. 26, no. 8, pp. 3886–3895, 2017.
[9] Dorina Thanou, Philip A Chou, and Pascal Frossard, “Graph-based compression of dynamic 3D point cloud sequences,” IEEE Transactions on Image Processing, vol. 25, no. 4, pp. 1765–1778, 2016.
[10] Robert A Cohen, Dong Tian, and Anthony Vetro, “Point cloud attribute compression using 3-d intra prediction and shape-adaptive transforms,” in 2016 Data Compression Conference (DCC). IEEE, 2016, pp. 141–150.
[11] Yiting Shao, Qi Zhang, Ge Li, Zhu Li, and Li Li, “Hybrid point cloud attribute compression using slice-based layered structure and block-based intra prediction,” in Proceedings of the 26th ACM international conference on Multimedia, 2018, pp. 1199–1207.
[12] Sébastien Lasserre and David Flynn, “On an improvement of RAHT to exploit attribute correlation,” input document m47378, ISO/IEC JTC1/SC29/WG11 MPEG, 2019.
[13] Eduardo Pavez, Benjamin Girault, Antonio Ortega, and Philip A. Chou, “Region adaptive graph Fourier transform for 3D point clouds,” in International Conference on Image Processing (ICIP), 2020.
[14] Antonio Ortega, Pascal Frossard, Jelena Kovačević, José MF Moura, and Pierre Vandergheynst, “Graph signal processing: Overview, challenges, and applications,” Proceedings of the IEEE, vol. 106, no. 5, pp. 808–828, 2018.
[15] Peyman Milanfar, “A tour of modern image filtering: New insights and methods, both practical and theoretical,” IEEE signal processing magazine, vol. 30, no. 1, pp. 106–128, 2012.
[16] Gene Cheung, Enrico Magli, Yuichi Tanaka, and Michael K Ng, “Graph spectral image processing,” Proceedings of the IEEE, vol. 106, no. 5, pp. 907–930, 2018.
[17] B. Girault, A. Ortega, and S.S. Narayanan, “Irregularity-aware graph Fourier transforms,” IEEE Transactions on Signal Processing, vol. 66, no. 21, pp. 5746–5761, 2018.
[18] Antoni Dimitriadis and David Taubman, “Augmenting JPEG2000 with wavelet coefficient prediction,” in 2020 IEEE International Conference on Image Processing (ICIP). IEEE, 2020, pp. 1276–1280.
[19] Henrique S Malvar, “Adaptive run-length/Golomb-Rice encoding of quantized generalized gaussian sources with unknown statistics,” in Data Compression Conference (DCC’06). IEEE, 2006, pp. 23–32.
[20] Cha Zhang, Dinei Florencio, and Charles Loop, “Point cloud attribute compression with graph transform,” in 2014 IEEE International Conference on Image Processing (ICIP). IEEE, 2014, pp. 2066–2070.
[21] C.L. Jackins and S. L. Tanimoto, “Oct-trees and their use in representing three-dimensional objects,” Computer Graphics and Image Processing, vol. 14, no. 3, pp. 249–270, 1980.
[22] D. Meagher, “Geometric modeling using octree encoding,” Computer graphics and image processing, vol. 19, no. 2, pp. 129–147, 1982.
[23] Eugene d’Eon, Bob Harrison, Taos Myers, and Philip A. Chou, “8i voxelized full bodies-a voxelized point cloud dataset,” ISO/IEC JTC1/SC29 Joint WG11/WG1 (MPEG/JPEG) input document WG11M40059/WG1M74006, 2017.