Intelligent Stereo Image Processing in Mobile System

Stereo rectification based on epipolar geometry is an image transformation process used to align stereo images to be coplanar. It transforms pairs of conjugate epipolar lines from multiple views to be collinear and parallel to the horizontal axes. For the stereo image pairs captured by lowcost dual camera systems of smartphones, additional distortion correction is required to achieve better visual effects of 3D display applications. In this paper, a novel image processing algorithms is proposed to simultaneously solve the stereo rectification and distortion correction problem into an integrated optimization system. Multiple synthetic experiments and real-data testing are used to prove that the proposed method can reduce the stereo rectification error in multi-resolution dual camera system. In addition, two applications are presented based on the proposed rectification techniques. The experimental results also show that the stereoscopic photographs improved by the proposed system can achieve better visual quality.


Introduction
Stereo vision system aims to extract the relative depth information from multiple images captured from different views.This technique had be widely applied to 3D modeling, robot vision, passive object scanning, and 3D display applications.Stereo rectification transforms the pairs of images onto a common image plane with epipolar geometry calculation.In the stereo vision system, image rectification plays an important role to simplify the point registration and disparity estimation.With the popularization of 3D devices and applications, large amounts of 3D contents are recorded by low-cost stereo camera systems or the mobile devices with dual camera module.The distortion effect in these low-quality images would dramatically decrease the accuracy of the stereo rectification estimation.For these low quality stereo images or video pairs, a novel computer vision algorithm is proposed to solve the image rectification and distortion correction problem simultaneously in this paper.The disparity map estimated by the proposed rectification method is extended to two applications, including image refocusing and segmentation.Fig. 1 shows the input data of the proposed system which includes a high resolution image and a low resolution image with serious distortion effect.
Estimating the disparity maps from stereo image pairs is essential to most 3D stereo applications.In order to derive the information of disparity more efficiently, it is required to rectify the images to be coplanar accurately based on the epipolar geometry.In traditional stereo rectification methods, the epipolar lines of the input images are aligned to determine the homogrphy matrix between two or more images.Fusiello et al. (1) had developed a famous linear rectification algorithm based on calibrated stereo cameras.It had been implemented in various 3D data processing toolkits and widely used in other 3D researches.They also improved this work to solve the uncalibrated cases with the quasi-Euclidean epipolar estimation.In addition, Heller and Pajdla (2) maps epipolar curves onto circles to rectify the omnidirectional images.The image quality of view synthesis is highly related to the estimated depth information, and the stereo matching may rely on a reliable stereo rectification with high accuracy.The rectification error will be increased dramatically with the optical defects, such as radial distortion.Distortion removal or distortion tolerance becomes an important issue in stereo vision researches.Scharstein (3) enhanced the view synthesis results to recover the texture and depth information of unknown area.Levin and Durand (4) adopted a dimensionality gap light field prior to improve the difference of light in multiple views.Recently, Zhao et al. (5) proposed a D-NOSE model based on depth-image-based rendering techniques to generate virtual views.This method can guarantee both the results of error-free pixel mapping and invariant occlusion ordering.To focus on the problem settings of this paper, Ringaby and Forssén (6) considered the distortion problem in the rectification process, and proposed a system based on RGB-D sensor on a moving platform.They estimate the camera trajectory in time-series domain and transform the 3D points to solve the distortion problem.
In this paper, a new stereo rectification method is proposed with the consideration of distortion factors, which might be caused by low-cost image acquisition system.The input images are assumed to be calibrated but with different resolution.The images captured by major camera are the reference images which are assumed without distortion effect, and the low-resolution stereo images should be corrected by the estimated distortion parameters.The main idea of this research is to solve the parameters in distortion model along with the stereo rectification process.The experimental results show that the proposed system can achieve better rectification results under various distortion conditions.In addition, accurate rectification results are used to estimate high-quality disparity maps.Two applications, refocusing application and segmentation application, are then presented based on the disparity maps.Fig. 2 shows an example of the rectified image pair from the input data (Fig. 1) by the proposed method.

Proposed Method
This chapter addressed the proposed rectification method with distortion correction process which can improve the rectification accuracy due to distortion factors estimation.Without the distortion correction, the rectified images, as shown in Fig. 3, will have relatively high rectification error.Fig. 3 visualized the error of the matched point pairs with different colors, where the dark blue points means the matched points with large errors.It shows that the pixels far away from image center cannot be rectified correctly.Section 2.1 reviews the fundamental models of multiple view geometry which will be used in the proposed system.The distortion model and the integrated optimization system to rectify the stereo images with distortion correction are described in the following sections.

Stereo Rectification
Stereo rectification simplifies the process of corresponding points matching in the stereo images.This method aims to transform the epipolar lines of conjugate images to be collinear and parallel to the horizontal axes.Through a simple 1D transformation concept, the subsequent steps of stereo vision can be simplified based on the rectified images.Hartley (7) introduced the mathematical theory of stereo rectification and implemented this system to solve the calibrated stereo images.Isgro and Trucco (8) extended the previous work and proposed a new projective rectification method without the explicit computation of fundamental matrix.Fusiello et al. (1) presented a similar linear-based method which had been widely used in the following 3D stereo applications.In order to achieve more robust estimation, the rectification process is formulated as a minimization problem of distortion (9,10) .Some researchers also focused on extending the calibrated works to solve the complex uncalibrated image pairs (11) .
In this research, the uncalibrated images are considered as the system inputs, which are the raw images captured by the stereo camera systems.The SIFT feature extraction (12) and simple Euclidean distance estimation between feature vectors are used to match the corresponding feature pairs in the stereo images.Let   and   be a pair of corresponding points that belongs to the left and right image, respectively, and F represents the fundamental matrix.The epipolar geometry between a pair of image can be formulated as follows: Fm  = 0 (1) Based on the constraints of epipolar geometry, the images captured from different viewpoints can be transformed into an identical coordinate system.The projection matrix of the old left image, the old right image, the new left image, the new right image can be represented as (P , , P  , P  , P  ).. The relation between the old matrix and the new one can be represented by the rectification homography H : P  = HP  (2) Following equation ( 2), P  is also the product of   and P  .For each projection matrix, it can be further decomposed as: where K, R, and t denotes the calibration matrix, the rotation matrix and the translation vector, respectively.Based on the above equations, H can be estimated as follows: = P 1:3 −1 P 1:3 where   =     −1 denotes the rotation matrix for rectifying the old rotation matrix   to the new one   and P 1:3 denotes the left 3x3 matrix of original P  .The homography   for right image can be also estimated following the previous equations with P  , P  ,   , K  , and K  .Combined with the equation ( 1) and ( 2), the epipolar geometry constraints can be formulated as the following equation: where ] is a specific form of the fundamental matrix for a rectified image pair.[ ] × denotes a 3x3 skew symmetric matrix which means the cross product of two 1x3 vectors.   and    are the jth corresponding pair in the stereo image.Note that the degree of freedom of calibration matrix K is equal to 5, the radiation matrix   and   has 3 DOF, and the homography matrix,   and   , will have 8 DOF.From equation ( 4) and ( 5), the fundamental matrix can be rewritten as the equation ( 6): To estimate the fundamental matrix, the Sampson error (13) , which is a first order approximation of the geometric re-projection error, is used on this research.For the jth corresponding pairs (    ,    ) and the associate fundamental matrix F is defined as equation ( 7): where is the square ith entry of the vector    .
With the corresponding pair points, the error term can be minimized by the least square algorithms.

Image Distortion Model
Distortion is a common image defects caused by optical aberration which can be further divided into two major types: radial distortion and tangential distortion.The radial distortion is mainly caused by imperfect lens, and the tangential distortion is related to the manufacturing defects of cameras.Common patterns in radial distortion can be divided into three major categories, named as the barrel distortion, the pincushion distortion, and the mustache distortion.The wide angle lenses will lead to the barrel distortion.The horizontal and vertical reference straight lines would be transformed to curves which bent outwards the image center.In contrast, pincushion distortion would make the image to be pinched at its center.The reference lines are transformed to be curves that bent inwards toward the image center.The mustache distortion is a mixture of the previous two types.
Both of these radial distortion effects can be modeled by a polynomial equation which is proposed by Zhang (14) .Let (  ,   ) denote the image coordinate captured by an ideal pin-hole camera.The (   ,   ) which represents the corresponding point coordinate of the distorted image.The polynomial model of distortion can be written as the equation ( 8) and (9): where   and   denote the nth radial distortion coefficient and tangential distortion coefficient, respectively.In addition, r represents the Euclidean distance between undistorted image point and distortion center (   ,   ) as follows:  = √(  −   ) 2 + (  −   ) 2 (10) As barrel distortion and pincushion distortion belongs to radial distortion, only the K parameter is required for effect modelling.Image wrapping technique (15) can be applied to correct the distortion effect based on the matching pairs of (  ,   ) and (  ,   ).In addition to the use of corresponding point pairs, Prescott and McLean (16) presented a line-based method to estimate the distortion parameters.Weng et al. (17) presented a camera calibration model which estimated the calibration parameters and distortion factors in two stages through an iterative nonlinear optimization process.In this paper, the matched point pairs of stereo image will be used to solve the stereo rectification and the distortion coefficients simultaneously.

Stereo Rectification with Distortion Correction
In this research, a novel stereo rectification system with distortion correction is proposed.The distortion parameters can be estimated in the rectification process with the matched point pairs of the stereo images simultaneously.Based on equation ( 7), (8), and (9), the cumulative cost function can be rewritten as equation (11) After determining the corresponding pairs of feature points by SIFT algorithm (12) , the fundamental matrix and the distortion parameters would be estimated by minimizing the cost function.
In the practical settings of the proposed system, only K 1 and K 2 are used to model the radial distortion.Each vector S i contains total eighteen values and the input parameter C should be larger than 9 for more robust estimation results.Although one of the input images is recorded in low resolution, more than three hundred pairs of matched points can be estimated after simple outlier elimination by fundamental matrix calculation.The proposed system can avoid the heavy computation of RANSAC framework and the time consumption well be decreased effectively.Fig. 4 shows the estimated epipolar lines from the input image pair (Fig. 1) with radial distortion.

Experimental Results
The proposed stereo rectification with distortion correction method is first verified with the synthetic images in simulation experiments.In order to evaluate the robustness of the proposed system with different levels of radial distortions, various types and extents of synthetic distortion parameters are examined in the section 3.1 and 3.2.The input stereo images included a high-resolution image captured by the main camera without distortion, and the other one has lower resolution and may contain serious radial distortion effect.
Five stereo image pairs (named as T1 to T5) with various synthetic distortion settings shown in Fig. 5 which will be examined in the following experiments.The SIFT algorithm (12) is applied to extract the feature points from stereo images and corresponding matched pairs are estimated by simple Eculidean distance in feature space.The rectification error is defined as the vertical distance of the corresponding point pairs in the rectified images.Finally, section 3.3 demonstrates the experimental results of the proposed method with real images captured by stereo camera.All the experiments were performed on the PC with Intel Xeon E3-1230 V2 CPU, which contains four 3.3GHz cores, and 16GB DDR RAM.The executional time of each stereo image is near 4 seconds with the default setting, C=20 and S=200.

System Performance in Synthetic Experiments
The first experiment is to prove that the proposed system can accurately estimate the distortion parameter K 1 which has truth value of synthetic distorted images.Five stereo testing images, labeled as T1 to T5, with various synthetic distortion settings are examined in this experiment.The experimental results are illustrated in Fig. 6.In this figure, horizontal axis represented the eleven given synthetic distortion parameters which ranged from -1 to 1, and the vertical axis represents the error between the estimated value and the ground truth.
The error of estimated distortion parameter is illustrated in the Fig. 6.The values can be estimated higher or lower than the ground truth without any tendency.In addition to the T5, all the estimated parameters are very close to the ground truth, and the average error is about 0.0129.Without additional synthetic distortion, the proposed system can correctly estimate K 1 as 0 in most testing images, including T1, T2, T3, and T4.Among all the synthetic distortion parameter settings, the estimated error will be increased with K 1 increased to 1 or decreased to -1.The largest estimation error, which is about 0.13, appeared in T5 with K 1 =1.The most accurate estimation occurred in T1 when K 1 =0, and the value is around 0.001.In average, the estimated error of distortion parameter achieved best results in K 1 =0 and has largest value in K 1 =1.The average estimated error of different synthetic distortion parameter settings ranges from 0.0079 to 0.0167.
Among the five testing stereo images, the estimation error of four image pairs can be controlled between -0.02 and 0.02.The average error of these pairs is about 0.0066.T3 may achieve best estimation result, which has less than 0.0064 average error, than the others.The worst case is T5, which have 0.0381 average error.
The reason of the large error of T5 is due to the estimation result of 1.13 when K 1 is set to be 1.The large difference between the estimated value and ground truth, which is 0.13, had seriously increased the average error value.Since T5 is a natural image pair with complicated and repeated textures, such as grass and tree, it is difficult to extract the discriminant feature points and find the correct matched pairs.In spite of the larger error, the corresponding rectified images and disparity maps can still generate great visual effect of 3D stereo vision at last.
Following the experimental settings of the previous experiment, this section describes the performance of image rectification results by the proposed method, which is shown in Fig. 7.The horizontal axis represented the eleven synthetic distortion parameters ranged from -1 to 1, and the vertical axis represents rectification errors counted in pixels.The vertical error of rectified image is about 0.3404 pixels in average.
When K 1 =0, the rectification error of the five image pairs range from 0.1886 to 0.3001.In this experiment, T1 has the best result and T5 is the worst.The rectification error of T1 and T5 are 0.1886 and 0.3002, respectively.The average rectification error is about 0.2516.Although the distortion  parameter K 1 can be correctly estimated, the rectification error still increased along with the increment of K 1 value.For example, the rectification error of T1 gradually increased from 0.1886 to 0.3495 when K 1 increased from 0 to 1. Compared to T1, the average error of T4 is larger under every synthetic distortion settings but the curve trend is similar as T1.It seems that the rectification error will gradually converge when K 1 are larger than 0.4 or smaller than -0.4.
In the previous experiment, the estimated K 1 of T5 have the largest error.Compared with other testing samples, the rectification error of T5 is not the worst.When K 1 is larger than 0.5 or smaller than -0.4, the rectification error is even better than the other 4 image pairs.This finding proved that the large estimation error of distortion parameter is mainly related to the imperfect matched point pairs.

More Complex Distortion Model
In real images, it is difficult to model the distortion effects by the simple division models.In this experiment, a high-order and complex distortion model is used to synthesize the radial distortion with two parameters, labeled as K 1 and K 2 .K 1 and K 2 are the radial distortion coefficients of the r square and r to the fourth power, respectively.These two parameters would estimate with the stereo rectification process simultaneously by the proposed method.The rectification error of each stereo image pair is estimated.This experiment can verify the robustness of the proposed method to handle more complicated distortion model.Table 1 lists the average rectification error counted in pixel from the same five testing image pairs.
The largest rectification error occurred when K 1 =1 and K 2 =0, which is about 0.4048.In contrast, when the original image pair is used as input, that is, both K 1 and K 2 equal to zero, the rectification error reduced from 0.2922 to 0.2289 by the complex model.It means that the input image without synthetic distortion may have slight radial distortion from original optical system.From Table 1, it can be observed that average error was reduced by using the complex model in most cases.In addition, it can also be concluded that K 1 had more impact to the rectification results than K 2 due to the mathematical equation of distortion models.

Real-data Experiments
Finally, the proposed method is tested with real images captured by stereo camera.Fig. 10 showed the rectification result with different distortion models of the same scene.Fig. 8(a) is the rectified image with the complex distortion model which is examined in the section 3.2.Fig. 8(b) is similar to the previous settings with single K 1 is estimated.Fig. 8(c) is the traditional stereo rectification without distortion correction.It is obvious that the proposed method can correct distortion effect and achieve better rectification accuracy.The other three image pairs of different scenes are tested by the proposed method, and the results are showed in Fig. 9.
The rectified images of the proposed method are used as the system input of stereo matching system which is proposed by Yang (18) .Fig. 10 shows the estimated depth values of Fig. 2. It is obvious that the stereo rectification with distortion correction can achieve better rectification accuracy.The improvement can enhance the performance of stereo matching and disparity estimation dramatically.

Image Refocusing Application
The disparity maps derived from the proposed rectification process is applied to the image refocusing application.In this paper, Gaussian function, showed in equation ( 13), is used to implement the blurring effect.The parameter μ is controlled by the depth difference between the processing pixel and the reference pixel.It means that the larger synthetic blurs will be added into the pixels which is far away from the reference pixel in depth.

Image Segmentation Application
Considering the disparity and color information simultaneously, normalized cuts algorithm can be improved to segment the objects from images more accurately: The proposed method can segment the target object with simple click operation.Figure 12 shows the image segmentation results which combine the information and color similarity and depth similarity into normalized cuts algorithm.Although the pixels have different color, they can be grouped with the estimated depth values.In addition, Figure 13 shows further application to synthesize the clipped image segments into another background image.This technique can be applied to rapid image editing applications.

Conclusions
In this paper, a novel stereo rectification method with distortion correction is proposed.The information of matched feature pairs from stereo images may be used to estimate the parameters of radial distortion model and rectify the images to be coplanar.The experimental results show that the proposed system can achieve better rectification results effectively and efficiently.The proposed method can be applied to multi-resolution dual camera system which includes a high-resolution image with better optical quality and a low-resolution image with serious radial distortion effect.For the improvement of this paper, the proposed system should cover more image defect models, such as the image noise and the radiometric variations.In addition, this framework is used to estimate accurate disparity map and then can be extended to two applications, image refocusing and image segmentation, by using Gaussian blur and normalized cuts, respectively.The segmented regions of stereo images may have rich information to improve the accuracy and efficiency of 3D vision system.

Fig. 1 .
Fig. 1.The real multi-resolution image pair captured by a low-cost stereo system.

Fig. 2 .
Fig. 2. The rectification results of an image pair with different image resolution.Fig. 3.The visualization of estimating rectification error without distortion correction.

Fig. 4 .
Fig.4.The image rectification result of input data (Fig.1) with the illustration of epipolar lines by the proposed method.

Fig. 5 .
Image data with different synthetic radial distortion effects

Fig. 6 .
Fig. 6.The error of estimated distortion parameter in various synthetic parameter settings.

Fig. 7 .
Fig. 7.The rectification error with different synthetic distortion parameters
Fig. 11 shows the experimental results of refocusing application.Fig. 11(a) is the original image captured by the main camera, and Fig. 11(b) is the depth map which is estimated based on the proposed rectification method.Fig. 11(c) is the first refocusing effect on the frontal object, and Fig. 11(d) is the second refocusing effect on the backpack.

Fig. 11 .Fig. 12 .Fig. 13 .
. (a) The estimated depth values of the rectified images by the proposed system.(b) The estimated depth values of the rectified images without using the distortion model.The image refocusing experimental results includes (a) original image, (b) disparity map, (c) focus on the drink, and (d) focus on the backpack.The image segmentation experimental results includes (a) original image, (b) disparity map, (c) segment the girl with black hat, and (d) segment the monitor screen region from the image.The image editing application includes (a) original image, (b) disparity map (c), frontal desk segmentation, and (d) synthesis in another background.