Combination of Variant 2D Cepstral Features for 3D Model Retrieval

In this paper, we will propose a 3D model retrieval approach using variant 2D cepstral features. First, six projection images, which represents the depth information from the model surface to the projection planes corresponding to six different viewing directions, are generated. Then, variant 2D cepstral features, obtained by using different subband decomposition methods, are extracted from each projection image to describe each 3D models. Experiments conducted on the Princeton Shape Benchmark (PSB) database have shown that the proposed 2D cepstral features outperforms other state-of-the-art descriptors in terms of the discounted cumulative gain (DCG) value.


Introduction
The development in modeling, capturing, processing and displaying hardware/software has lead to the proliferation of 3D models.Typically, 3D models play an important role in a number of applications, such as computer animation, movie production, computer games, virtual reality, computer aided design, etc.Thus, it is necessary to design an effective 3D model retrieval system which enables users to search 3D model databases for similar 3D models.As a result, content-based 3D model retrieval becomes an important research topic.
The main challenge to a content-based 3D model retrieval system is how to extract representative features to effectively discriminate the shapes of various 3D models (1) .In general, there are two main paradigms for 3D model retrieval: graph-based approach and feature vector approach.The graph-based paradigm uses a graph to represent the structural information of a 3D model.A graph matching scheme is then employed to calculate the similarity between two 3D models.The feature vector paradigm tries to represent each 3D model by a number of numerical descriptors (feature vector), which generally capture the geometrical and topological properties of a 3D object.
The graph-based paradigm describes how the model components are linked together.The most well-known methods include multi-resolution Reeb graph (2)(3)(4) , skeletal graph (5,6) , attributed graph (7) , etc.This paradigm is robust to 3D shape deformation and suitable for partial matching between models.However, it is computationally intensive to obtain the graph-based representation of a 3D model.Further, such a representation is generally sensitive to the fine components of 3D models.
View-based descriptors are generally obtained by projecting a 3D model on a number of 2D projections from different views.Discriminative features extracted from these 2D projection planes are combined to index similar 3D models.These 2D planes can be represented by either binary images representing the silhouettes from different views (32)(33)(34) , or by gray-level images representing the curvature information (35) or the depth information (36)(37)(38)(39)(40)(41)(42)(43) .One advantage of view-based descriptors is that it is easy to design a query interface which supports a 2D sketch for 3D model retrieval (1,32) .The problem is that rotation invariance has to be solved by either pose normalization prior to 2D projections, by extracting rotation-invariant features, or by matching 2D feature descriptors over many different alignments simultaneously.
In this paper, we will propose a 3D model retrieval method using variant 2D cepstral features.In Section 2, the proposed 3D model retrieval method will be described.Section 3 gives some experimental results to show the effectiveness of the proposed 2D cepstral features.Finally, conclusions are given in Section 4.

3D Model Retrieval Using Variant 2D Cepstral Features
To extract 2D cepstral features from each 3D model, we first align the pose and normalize the scale of each 3D model.Second, the smallest bounding cube that circumscribes the 3D model is divided into a voxel grid.Third, six projection images corresponding to six viewing planes will be generated.The pixel value represents the depth value of each voxel projected onto the corresponding viewing plane.Finally, variant 2D cepstral features, corresponding to different subband decomposition methods, will be extracted and combined for 3D model retrieval.

Pose Alignment and Scale Normalization of 3D Models
To make the features extracted from each 3D model insensitive to rotation, translation, and scale variations, we have to align the pose and normalize the scale of each 3D model.In this paper, we used the grid-based principal component analysis (GPCA) (42) to align the pose of each input 3D model.The smallest bounding cube that circumscribes the aligned 3D model is then decomposed into a voxel grid of size 2G×2G×2G (in this paper, G = 64).If there is a 3D mesh located within the voxel with coordinates (x, y, z), this voxel is defined as an opaque voxel, denoted by V(x, y, z) = 1; otherwise, this voxel is defined as a transparent voxel, denoted by V(x, y, z) = 0. Besides, to make the extracted features insensitive to translation and scaling variations, we move the mass center of each 3D model to coordinates (0, 0, 0) and normalize the scale of each 3D model such that the average distance from all opaque voxels to the mass center is G/2.

Projection Images Generation
Once the pose of a 3D model is aligned, each opaque voxel will be projected onto six viewing planes, corresponding to the six different views (front, rear, left, right, top, and bottom), to get six projection images with each pixel value representing the depth value.The depth value, which describes the distance from each opaque voxel to the viewing plane, can capture the depth information of the model's surface to each viewing plane (36) .Thus, each projection image can be represented by a gray scale image from which variant 2D cepstral features will be extracted for 3D model matching.Let the six projection images be denoted by I k , k = 1, 2, …, 6.Then, the gray value of each pixel on these projection images is computed as follows:

Extraction of Variant 2D Cepstral Features
For each projection image, variant 2D cepstral features corresponding to different subband decomposition  Note that the radial direction and angular direction denote respectively the frequency variation and edge direction of the projection image.In our previous work (43) , we proposed uniform subband decomposition (USD), where uniform radial division (URD) method is used to divide the spectrum into a number of concentric shells with their radii being equally spaced.Besides, two angular division methods, called generic angular division (GAD) and complement angular division (CAD) are designed to divide the magnitude spectrum into equal-angle sectors.That is, the angular direction of the spectrum is also uniformly divided (please see Fig. 2).For GAD, a frequency variable In this paper, we will further explore logarithmic subband decomposition (LSD) in which logarithmic radial division (LRD) method will be designed to divide the magnitude spectrum into different concentric shells in a sense that the logarithms of the radii will form an arithmetic sequence.That is, the radii of the concentric shells can be represented by 2 0 , 2 1/2 , 2 1 , 2 3/2 ,…, 2 R/2 = G (R = 12 if G = 64, thus M = R + 1 = 13).For URD and LRD (see Fig. 3), the spectrum is uniformly/logarithmically divided along the radial direction.For URD, the frequency variable (u, v) In this paper, we set M = 16 and N = 16 for USD, M = 13 and N = 16 for LSD.According to the combination of different angular division methods (GAD and CAD) and radial division methods (URD and LRD), four (2×2) distinct subband decomposition methods can be constructed to divide the 2D spectrum into several subbands (see Table 1 for a list and Fig. 4 for illustration of these decomposition methods).For each subband, the sum of magnitudes of all DFT coeffieients within this subband is computed: (12)   By taking the inverse 2D-DFT on M j (r, θ), we can get the 2D cepstrum C j (p, q) of the j-th projection image: where (p, q) is 2D cepstral quefrency index.The magnitudes of these M×N cepstral coefficients will constitute the 2D cepstral feature vector: In this paper, the 2D cepstral feature vectors extracted from all projection images will be concatenated to constitute the 2D cepstral feature vector of a 3D model: Given a 3D model, we can get different 2D cepstral feature vectors using different subband decomposition methods.Therefore, four different feature vectors, which describe different frequency variations in the magnitude spectra of the projection images, will be used to represent each 3D model: Similarly, let URGAD y , URCAD y , LRGAD y , and LRCAD y denote respectively the feature vectors extracted from a target model in the database.The distance between each feature vector of the query model (q) and that of the matching model (m) for each subband decomposition method is defined as follows: ∑ ∑ For uniform subband decomposition (USD) (43) , the distance between the query model and the matching model is defined as the sum of the distances computed using URGAD and URCAD decomposition methods:

Experimental Results
To demonstrate the effectiveness of the proposed 2D cepstral features, some experiments have been conducted on the Princeton Shape Benchmark (PSB) database (44) which contains 1814 models divided into 907 training models (90 classes) and 907 test models (92 classes).The discounted cumulative gain (DCG) (22) , is used to compare the performance of different descriptors.DCG value at the r-th rank is defined as follows: (27)   where L r = 1 if the r-th model in the ranked retrieval list and the query one have the same class label; otherwise, L r = 0.The overall DCG value for a query model is defined as DCG kmax , where kmax is the total number of models in the database.Thus, if the class labels of the models showing in the head of the retrieval list are identical to that of the query one, a larger DCG value can be obtained.On the other hand, if the models having identical class label to the query one always located in the tail of the retrieval list, a smaller DCG value will be obtained.
In the experiments, each model in the database is presented as a query one to evaluate the average DCG value.Table 2 shows the retrieval performance of the proposed 2D cepstral features.In this table, for URGAD(16×16) and URCAD(16×16), both the angular and radial directions are equally divided into 16 sections; for LRGAD(13×16) and LRCAD(13×16), the angular direction is uniformly divided into 16 sections whereas the radius is logarithmically divided into 13 segments.From this tatble, we can see that the extracted 2D cepstral features using logarithmic subband decomposition outperform their counterparts using uniform subband decomposition (LRGAD(13×16) vs. URGAD(16×16), LRCAD(13×16) vs. URCAD(16×16), LSD vs. USD).Furthermore, the combination of LSD and USD outperforms each individual one (ULSD vs. USD, ULSD vs. LSD).Table 3 compares the retrieval results of the proposed ULSD descriptor with other state-of-the-art descriptors.The experimental results show that the proposed 2D cepstral descriptor (ULSD) outperforms the other descriptors in terms of DCG value.

Conclusions
In this paper, variant 2D cepstral features are explored for 3D model retrieval.Initially, the pose and scale of each 3D model are aligned and normalized such that the features extracted will be less sensitive to rotation, translation, and scaling variations.Next, six projection images, which describe the depth value of each 3D voxel from six different views, will be generated and is represented as gray-level images.Different subband decomposition methods, including URGAD, URCAD, LRGAD, and LRCAD, are conducted for obtaining variant 2D cepstral features from Table 3.Comparison of the proposed descriptor with other descriptors in terms of the DCG value.Note that the DCG values for approaches marked with * are from [22].
distance corresponding to logarithmic subband decomposition (LSD) is defined as the sum of the distances computed using LRGAD and LRCAD decomposition methodsdistances computed using USD and LSD can be combined to get the combined distance:

Table 1 .
A list of different subband decomposition methods.

Table 2 .
Retrieval results of the proposed descriptors in terms of DCG value.

29 Proposed ULSD 74.64 each
projection image.Experiments on PSB database have shown that the proposed 2D cepstral descriptor (ULSD) outperform other state-of-the-art descriptors in terms of DCG value.