Face Sketch Synthesis System Utilizing Both Global Structure and Local Detailed Texture Components

This study developed a facial sketch synthesis system based on the two-dimensional direct combined model (2DDCM) approach that employs a collection of photo/sketch pairwise training samples. The proposed synthesis framework addresses the following key issues. First, we directly combine each photo and sketch pairwise sample in a concatenated form in order to completely preserve their relationship. Second, photo and sketch images are formed as two–dimensional matrices instead of vectors in order to preserve the facial geometry. Third, both the global facial geometry and the local detailed textures are included in the proposed synthesis framework. Experiments demonstrate our approach can synthesize high quality reconstructed facial sketches from given unseen photos.


Introduction
In recent years, facial sketch synthesis is usually applied in law enforcement and digital entertainment.For example, witness provides the appearance of the suspect, and the artist draws a corresponding sketch for visualization.Also, the sketch can also be sued in digital entertainment, when we can use facial sketches to get others attention.There are two types of synthesis approaches to generate facial sketches: the image-based approach and the exemplar-based approach.Image-based methods [2,5] generate special strokes by using image processing approaches.For example, framework in [2] generates sketch by applying bilateral filter to the color distribution of input photo; Work [5] combines the properties of image tone and sketch stroke to synthesize facial sketches.However, the synthesized results produced by these kind of approaches are without particular drawing styles as the sketches drawn by the talented artist.
Exemplar-based methods, on the other hand, require a large number of training pairwise photo and sketch samples.These training samples are then used to learn how to generate (or exaggerate) a correct sketch for an unseen input photo.A typical kernel learning method is non-parametric-based Markov random field (MRF) [9,11,12].For example, work [1] used the E-HMM approach to construct the relation between photo and sketch, where a human face is divided into five parts, namely, forehead, eye, nose, mouth, and chin, in order to preserve detailed textures.Work [11] used the MRF graph to establish the correlation between the photo and sketch and constrains the smoothness between the neighborhood patches of the synthesized sketch.Work [12] was also based on the MRF approach, but this framework solved the varied light and pose changing conditions by using difference-of-Gaussian (DoG) representations.Work [9] integrated the parametric-based approach in the MRF structure.
These exemplar-based frameworks provided high quality synthesized sketch, however, the synthesized result are sensitive to the number of training samples.For example, synthesized results don't contain the personal specific characteristics of the corresponding input images since these geometry characteristics are not provided in the training dataset.In this study, a facial sketch synthesis system is proposed based on the parametric-based 2DDCM approach [8].Different from the MRF-based framework, facial sketches produced by the 2DDCM approach are the DOI: 10.12792/icisip2016.083

FACIAL SKETCH SYNTHESIS FRAMEWORK
The proposed framework is illustrated in Figure 1, where it composes two modules, i.e., the global and local component synthesis modules.The first module learns the global facial geometry correlation between photo and sketch by using 2DDCM method [8] from N pairwise photo/sketch samples, The significant features of their correlation are extracted by the 2DDCM models, g U , where its 2DDCM transformation from photo to sketch, g PS T , is then constructed.Note that, such global 2DDCM model captures the global facial geometry, but less address the local detailed textures.
In the second module, the 2DDCM approach is applied in the patch-based manner.The goal of this module is to estimate the high frequency components lost in the global module.In the synthesis process, the photo residual of the current test t P , l t P , is reconstructed first.l t P is the photo high frequency component lost in the global module, which is then used to predicted the sketch lost, l t S , by applying the local patch-based 2DDCM approach.To be specified,

Global Facial Component Synthesis
We adopt the 2DDCM approach [8] for learning the correlation between two related but different classes, i.e. the class of facial photo and the class of facial sketch.It is because the 2DDCM approach has the ability to synthesize sketches through a geometry-preserved representation.Given a training dataset of N photo and sketch pairwise samples PS  , where each pairwise sample, ( , ) ii PS , is represented as a two-dimensional combined matrix form i A : where and (m is the image height and n is the image width), and thus . Accordingly, the covariance matrices of 1 {} N ii A  can be calculated as: , where TT [( ) ( ) ] is the mean matrix of the training samples; (2) is the covariance of all column patterns projected onto a subspace of U. The objective function of 2DDCM approach is to find the subspace U that  maximizes U C , where U cane be solved by applying singular value decomposition (SVD) process on U C (i.e., as performed in [8], albeit using the vertical column subspace only, for the sketch synthesis case).
The role of T TT SP U U U    is the significant pairwise features.Based on such basis (subspace U), we have the 2DDCM transformation from photo to sketch: , where 0.1 is a user-predefined parameter; U  is diagonal matrix of eigenvalues of U. Here, we denote such transformation as g PS T , which means the 2DDCM transformation from photo to sketch processed at the face level for the global appearance transformation.T .Fig. 4 is the synthesized result of l t S by using two different K-values for the local 2DDCM transformation learning process.It can be observed that, when K=300, the synthesized results are blurred, but preserve the correct local geometry.On the other hand, for the case of K=30, the synthesized results contain more detailed textures, but along with the noise, which makes unexpected lines or spots.In this study, we set the value of K as 80.

Experiment Results
We validated our method using the CUHK database; Figure 7 shows the synthesis results obtained using various existing kernel methods, namely the Eigen-transformation method in [6], the one-dimensional canonical correlation analysis (1DCCA) method in [10], the two-dimensional CCA (2DCCA) method in [3], the direct combined model (1DDCM) in [7], and the two-dimensional DCM method (2DDCM) in [8].It is observed that the synthesized results obtained using the 2DDCM approach and 2DCCA approach are qualitatively better than the others.Moreover, the images synthesized using the 2DDCM method (used in the present study) are clearer than those obtained using the 2DCCA approach.Notably, the synthesized results obtained using the 1D-based methods are less recognizable as the input image since the subject-specific geometry details are lost in the synthesis process.
Figure 8 shows the synthesis results obtained from the proposed framework for the non-profile images considered in [13].Note that the framework was once again trained using the CUHK dataset in the first experimental setting case.As shown, the sketches synthesized using the global module contain obvious facial traces in the central image

Conclusion
We have developed a new facial sketch synthesis framework based on a training dataset.This work was motivated by the inefficiencies and unsatisfactory results of representing two related images by two individual 1D vectors.Especially, both the global and local components are included in the final synthesized results.Compared with existing approaches, the proposed approach that embeds the facial geometry patterns into the 2D combination subspace improves the facial sketch synthesis technology.

Fig. 1 .
Fig. 1.Proposed 2DDCM-based synthesis framework; the global facial sketch synthesis module and the local detailed texture synthesis module, and the enhancement module.

P 1 {
is decomposed as several overlapped patches, each patch searches the corresponding K-NN pairwise patch in the training dataset,  .These similar pairwise samples are used to construct the local 2DDCM models, l p U , to infer the corresponding local patch in l t S .Combine l t S with the global synthesized result g t S , the system output is * t S .

Fig. 2 .
Fig. 2. Residual sketches obtained using the difference between the ground truth sketches and the global synthesized results.

Fig. 3 .
Fig. 3. Illustration of generation process for local component of image, l I , and local component of sketch, l S , using 2DDCM-based bidirectional transformations, i.e. g PS T and g SP T .

Fig. 4 .
Fig. 4.Residual sketches generation based on the 2DDCM approach by using different number of K in the proposed local model.Upper: K=30; Lower: K=300.

Fig. 5 .
Fig. 5. Example synthezized results of female subjects produced by the propsoed framework.(a) Input photos; (b) ground-truth sketches; (c) synthesized result obatined from the global module; (d) synthesized result obatined from the propsoed all two modules.

Fig. 6 .
Fig. 6.Example synthesized results of male subjects produced by the proposed framework.(a) Input photos; (b) ground-truth sketches; (c) synthesized result obtained from the global module; (d) synthesized result obtained from the proposed all two modules.
there are 188 photo-sketch pairwise samples, where 88 samples are selected for the training purpose while the rest 100 subjects are used for the testing purpose.Example synthesized results obtained from the proposed framework are shown in Fig 5 and Fig. 6.It can be seen that the synthesized results obtained using the global module preserve the facial geometry (Figs.5(c) and 6(c)).Figures 5(d) and 6(d) show the synthesized results obtained from the local module (i.e., the global module followed by the local module).Compared with the synthesis results obtained from the global module only, those obtained from the global and local modules contain more distinct sketch lines and curved strokes.

Fig. 8 .
Fig. 8. Synthesized results obtained for images used in Ref [13] when using CUHK training dataset in the first case for training purposes.First row: input images; second row: synthesized results from global module, and the third row: synthesized results obtained by the local 2DDCM in the local module.