3D Model Retrieval Using 3D-DFT Spectral and Cepstral Descriptors of Local Octants

Conventional 3D-DFT features proposed for 3D model retrieval tried to extract the spectral/cepstral features from the spectrum/cepstrum of the 3D model, without considering the local properties of a 3D model. In this paper, local 3D-DFT spectral descriptor as well as local 3D-DFT cepstral descriptor will be proposed for 3D model retrieval. First, the set of voxels covering the surface of a 3D model is divided into 8 octants. 3D-DFT spectral descriptors and 3D-DFT cepstral descriptors are extracted from each octant and combined to form the local 3D-DFT spectral descriptor and local 3D-DFT cepstral descriptor. Experiments conducted on the Princeton Shape Benchmark (PSB) database have shown that the proposed local 3D-DFT descriptors outperform other 3D-DFT descriptors.


Introduction
3D models have been widely used in computer aided design, computer animations, movie production, industrial product design, digital library, and so on. Since the number of 3D models grows rapidly, there exist increasing demands for 3D model retrieval systems. Typically, keyword-based search systems were provided to find similar images, videos, or 3D models. To facilitate 3D model retrieval using keyword search mechanism, each 3D model must be labeled with some appropriate keywords. However, it is time-consuming to manually annotate each 3D model with appropriate keywords. Further, it is sometimes hard to find appropriate keywords for some specific models. To overcome this problem, content-based 3D model retrieval mechanism has become widely used to find similar 3D models.
Transform-based descriptors are extracted by first mapping the 3D data into frequency domain. These frequency domain representations include probability density-based shape descriptors [21], concrete radicalized spherical [22], spherical wavelet transform [23], 3D discrete curvelet transform [24], rotation invariant spherical harmonics [1,25], spherical harmonics [26], 3D Fourier transform [27], angular radial transform (ART) [28], etc. The effectiveness of these frequency descriptors relies heavily on the resolution of the voxelization (voxel decomposition) operation of a 3D model and how to extract discriminative features from the set of transformed coefficients.
Graph-based descriptors have the potential of describing the geometrical and topological properties of a 3D model in a more faithful way, particularly for deformable 3D models. The methods include extended cone-curvature [53], medial scaffold [54], Reeb graph [55], skeleton-based descriptors [56], etc.
In our prior work [57], we have designed a 3D model retrieval system using 3D discrete Fourier transform (3D-DFT) spectral features and cepstral features. Experimental results have shown that these descriptors can obtain promising performances. In this paper, local 3D-DFT spectral and cepstral features will be extracted from different local regions of the 3D model and combined for 3D model retrieval.

3D-DFT Spectrum
The 3D-DFT spectrum was obtained by applying 3D-DFT on the voxels of a 3D model V(x, y, z) defined as follows [27]: where u, v and w are 3D-DFT frequency indices (0  u, v, w < N). The magnitudes of those low-frequency coefficients (except |F(0, 0, 0)|) were selected to form the 3D-DFT spectral features.

3D-DFT Cepstrum
By applying 3D-IDFT on the magnitude spectrum |F(u, v, w)|, we can obtain the 3D-DFT cepstrum C(a, b, c): where a, b and c are 3D-DFT quefrency (cepstral frequency) indices. The magnitudes of those low-quefrency coefficients were selected to form the 3D-DFT cepstral features.

3D-DFT Subband Spectrum
To obtain the 3D-DFT subband spectral features, we first decompose the 3D-DFT spectrum into K (K = WHL) subbands, notated by SB p,q,r (0 ≤ p < W, 0 ≤ q < H, 0 ≤ r < L), by using uniform subband decomposition or logarithmic subband decomposition along the x-axis, y-axis, and z-axis, respectively [57]. Then, the sum of magnitude spectrum within each subband, notated by M p,q,r , is calculated to yield the 3D-DFT subband spectral features:

Proposed 3D Model Retrieval Approach
Prior 3D-DFT feature extraction methods tried to extract the spectral/cepstral features from the spectrum/cepstrum of the 3D model, without considering the local properties of a 3D model. Thus, in this paper, we will try to extract local spectral/cepstral features to capture the local characteristics of different parts of a 3D model. First, the principal planes method [42] is used for pose alignment of a 3D model. Then, the 3D model is divided into eight octants. 3D-DFT spectral/cepstral features [27,57] will then be extracted from each local octant. Finally, the spectral and cepstral features extracted from all octants will be combined to search similar 3D models.

3D Model Alignment
In this paper, we use the principal planes method [42] to align the pose of each 3D model. First, the smallest bounding box that circumscribes the 3D model is divided into a voxel grid of size N×N×N (please see Fig. 1). In this paper, N = 64. If there is a polygonal surface passes through the voxel with coordinates (x, y, z), this voxel is defined as an opaque voxel, notated by V(x, y, z) = 1; otherwise, this voxel is defined as a transparent voxel, notated by V(x, y, z) = 0. As a result, the voxelization operation converts the 3D mesh model into a discrete 3D function with regularly sampled voxels in 3D coordinates. Second, the model's mass center is moved to (N/2, N/2, N/2). The 3D model is then scaled such that the average distance from all opaque voxels to the mass center becomes N/4. Third, the pose of this 3D model is aligned by the grid-based principal component analysis (GPCA) [33]. Thus, the features extracted from this aligned 3D model will be invariant to rotation, scaling, and translation.

Inverse Distance Transform
The voxelization operation will encode each voxel in the 3D Cartesian coordinate system a value 1 (denoting an opaque voxel) or 0 (denoting a transparent voxel). Such a binary encoding method will produce severe high-frequency artifacts due to the blocky structure of the binary voxel representation. To suppress these high-frequency artifacts, we will employ the inverse distance transform (IDT) proposed by Dutagaci et al. [20] to get a smoothed voxel representation. The IDT value associated with a voxel V is defined as follows: where DT(V) is the distance between V and its nearest opaque voxel in the 3D space given by where d(V, V O ) is the distance between voxel V and an opaque voxel V O measured by the Euclidean distance, V O is the set of all opaque voxels: Therefore, the voxels on the surface of the 3D model take the largest IDT values and the IDT value decreases smoothly as the voxels move away from the model surface. Fig. 2 gives an example showing how the IDT operation was used to get a smoothed voxel representation.

3D Model Decomposition
To extract the local features, we divide the set of voxels of a 3D model in the 3D space into 8 octants represented as follows: Fig. 3). Thus, each octant contains some specific voxels (revealing local properties) of a 3D model. Since the size of the bounding box is NNN, the size of each local octant is N/2N/2N/2. For each local octant, the 3D-DFT spectral and 3D-DFT cepstral features will be extracted and concatenated to form the local 3D-DFT spectral descriptor and local 3D-DFT cepstral descriptor. Fig. 3. Decomposition of a 3D model into 8 octants.

Local 3D-DFT Spectrum
The local 3D-DFT spectrum is obtained by applying 3D-DFT on the voxels within each octant Q i (1 ≤ i ≤ 8) to get the corresponding local spectrum: where u, v and w are 3D-DFT frequency indices (0  u, v, w < N), F i is the spectrum of the i-th octant. The magnitudes of those M×M×M low-frequency coefficients (except |F i (0, 0, 0)|) of each local spectrum F i will be concatenated to form the local 3D-DFT spectral descriptor. Thus, the length of the local 3D-DFT spectral descriptor is 8(M 3 -1).

Local 3D-DFT Cepstrum
By applying 3D-IDFT on each local magnitude spectrum |F i (u, v, w)|, we can obtain the local 3D-DFT cepstrum C i (a, b, c), where a, b and c are 3D-DFT quefrency indices. The magnitudes of those D×D×D low-quefrency coefficients (except |C i (0, 0, 0)|) of each local cepstrum C i will be concatenated to form the local 3D-DFT cepstral descriptor. Thus, the length of the local 3D-DFT cepstral descriptor is 8(D 3 -1).

3D Model Retrieval
Let f LS denote the feature vector corresponding to the local 3D-DFT spectral descriptor of the query model, and f LS,t denote the feature vector corresponding to the local 3D-DFT spectral descriptor of the t-th matching model in the database. The distance between the query model and the t-th matching model evaluated in terms of local 3D-DFT spectral descriptor is defined as follows: where f LS (m) and f LS,t (m) denote respectively the m-th feature value of f LS and f LS,t .
Similarly, let f LC and f LC,t denote the local 3D-DFT cepstral descriptors of the query model and the t-th matching model, respectively. The distance between the query model and the t-th matching model evaluated in terms of local 3D-DFT cepstral descriptor is given as follows: (18) where f LC (d) and f LC,t (d) denote respectively the d-th feature value of f LC and f LC,t . Therefore, the models that are most similar to the query one can be found by finding those having smaller distance values.

Experimental Results
To demonstrate the effectiveness of the proposed local descriptor, some experiments have been conducted on the Princeton Shape Benchmark (PSB) database [58]. The PSB database contains 1814 models (161 classes) which are divided into 907 training models (90 classes) and 907 testing models (92 classes). The experiments were performed on the 907 testing models (see Fig. 3 and Fig. 4) to keep the results comparable with other methods. The discounted cumulative gain (DCG) [59] will be employed to compare the performance of different descriptors. DCG at the r-th rank is defined as follows: (19) where L r = 1 if the query model and the r-th model in the ranked retrieval list have the same class label; otherwise, L r = 0. The overall DCG score for a query model is defined as DCG T , where T is the number of models in the database. It is clear that if the models appearing in the front of the retrieval list have the same class label as the query one, a larger DCG score can be obtained. On the other hand, if the models having identical class label to the query one appear in the rear of the retrieval list, a small DCG score will be obtained. In our experiments, each model in the database will be presented as a query one to evaluate the average DCG score among all query models: where DCG T (t) denotes the DCG score associated with the t-th matching model. Fig. 3. Some 3D model classes in the PSB databse [58]. Fig. 4. The biplane class in the PSB databse [58]. Table 1 compared the performance of different 3D-DFT descriptors, including (1) 3D-DFT spectrum (notated by 3D-DFT-S), (2) 3D-DFT cepstrum (notated by 3D-DFT-C), (3) 3D-DFT subband spectrum with uniform subband decomposition (notated by 3D-DFT-S-USD), (4) 3D-DFT subband spectrum with logarithmic subband decomposition (notated by 3D-DFT-S-LSD), (5) 3D-DFT subband cepstrum with uniform subband decomposition (notated by 3D-DFT-C-USD), (6) 3D-DFT subband cepstrum with logarithmic subband decomposition (notated by 3D-DFT-C-LSD), and the proposed local 3D-DFT spectrum (notated by 3D-DFT-LS) as well as local 3D-DFT cepstrum (notated by 3D-DFT-LC). We can see that, the proposed local 3D-DFT descriptor outperforms their corresponding 3D-DFT descriptors. Table 2 compares the proposed local 3D-DFT descriptors with/without performing IDT operation before feature extraction. It can be seen that by performing IDT operation to get a smoothed voxel representation of each 3D model, we can remove those high-frequency artifacts and thus will achieve higher retrieval accuracy.   Table 2. Comparison of the proposed local 3D-DFT descriptors with/without performing IDT operation. The parentheses show the setting of the parameters.

Conclusions
In this paper, local 3D-DFT spectral descriptor and local 3D-DFT cepstral descriptor are proposed for 3D model retrieval. First, the set of voxels of a 3D model in the 3D space is divided into 8 octants. 3D-DFT spectral features and 3D-DFT cepstral features are extracted from each octant and combined to form the local 3D-DFT spectral descriptor and local 3D-DFT cepstral descriptor. Experiments conducted on the Princeton Shape Benchmark (PSB) database have shown that our proposed local 3D-DFT descriptors outperform other 3D-DFT descriptors.