Improved Pre-focusing Method and Its Application to Three-Dimensional Space Scanning for Detecting An Object

A Three-Dimensional Scanning Method (TDSM) was proposed to detect an object or a person in a three-dimensional Region Of Interested (ROI) without the stereo matching process. The method TDSM included two processes; ROI scanning in the field of view of the multiple cameras and the object detection in the ROI. The former was realized by the theoretically derived two sets of coefficients for perspective transformations; one canceled disparities on a plane between the left and the center images, and the other those on another plane between the right and the center. The ROI was formed at the intersection of the two planes. The latter was performed by detecting residual disparities between the left images and the right images. One-dimensional optical flow was used for the detection. The method TDSM was successfully applied to detect a person walking in the field of view of the multiple cameras.


Introduction
Various types of autonomous cars have currently been attracted attention, developed and tested on actual roads. They always sense their surroundings to reach safely their destinations. Many types of sensors such as infrared lasers, cameras, and millimeter-wave radars [1] were used in the sensing process. Computer systems were also used to understand their surroundings and to determine the next actions. In fact, TOYOTA has already developed a practical Pre-Crash Safety System [2]. Camera-based systems have also been developed, i.e., SUBARU s Eyesight and HONDA s Intelligent Night Vision system [3]. These systems commonly obtained 3D road information not only to correct control but also to detect pedestrians entering the street [4] for avoiding car-human accidents. The systems, however, needed a time-consuming stereo matching process [5]∼ [7] for getting the 3D information. There were at least two approaches to solve the problem. One was to use a sub-system, such as GPGPU, to speed up the processing [8]. The other was to adopt a smart method to avoid the stereo matching process. The former solution was trivial and expensive, though the latter low-cost and flexible. The authors have proposed a such method named as the prefocusing method (PFM) [9] to detect a pedestrian at a prefixed ROI. In the PFM, we needed calibrations for each prefocused ROI each time. A smart method for avoiding the calibration has been required.
In this study, we propose a spatial scanning method TDSM for random positioning of ROIs. Once a camera cal-ibration is performed, TDSM automatically generates sets of coefficients for the perspective transformation to produce the ROI at any point in the field of view of the multiple cameras. Disparities between the left and right camera images are evaluated by using an optical flow technique [?]. We find an object or a person in the ROI when no disparity is detected in the ROI.

Methodology
The proposed method TDSM theoretically derives coefficients for pre-focusing in the 3D ROI for any point in the field of view of multiple cameras, and determines whether the object exists there.

Creation of ROI
Suppose 3 cameras are linearly located, and assume their optical axes are parallel with no relative rotation. Each camera has already calibrated and they have their own coefficients for three to two dimensional (3D-2D) perspective transformation as respectively, where, a li (i = 1 ∼ 11) means coefficients for the 3D-2D perspective transformation from a point (x, y, z) in the global coordinates to a pixel (p, q) in the image captured by the left camera, a ci (i = 1 ∼ 11) , (s, t), and a ri (i = 1 ∼ 11), (u, v) for the center and the right ones, respectively. When these cameras C l , C c and C r observe the same rectangular on a plane A at a distance, their images have some disparities among them due to parallax. Suppose a 2D-2D perspective transformation from (p, q) to (s, t) as where, coefficients b li (i = 1 ∼ 8) are calculated from the four pair of vertices. Using the transformation, we can register the left camera image onto the center one. This means that the transformation (4) removes the relative disparities of an object on the plane above. Thus, cameras C l and C c are prefocused on the plane A. We set another plane B and apply the same process to cameras C r and C c with using a 2D-2D perspective transformation The transformation (5) cancels relative disparities of the object on the plane B as same as (4) does. Now, we have two sets of pre-focused cameras C l and C c on the plane A, and C r and C c on B. Suppose planes A and B have an intersecting line, and suppose an object set on the line is observed by all cameras at a time. Let s I l , I c and I r be images captured by cameras C l , C c and C r , respectively, and I lc and I rc be transformed images from I l by (4) and from I r by (5), respectively. The object observed in the image I lc and that in I c have no disparity between them, because the object is on the plane A. Similarly, the object observed in I rc and that in I c have also no disparity between them, because the object is also on the plane B. In this situation, the object in I lc and that in I rc have no disparity between them. Only object set on the intersecting line of the planes A and B achieves the situation. Thus, the ROI is created on the intersecting line of the planes A and B.

Scanning the ROI
In the proposed method TDSM, the ROI is set on the intersecting line. For scanning the ROI, we have to produce two virtual planes A and B so that their intersecting line is on the point we want to observe. The procedures for performing the pre-focusing using the virtual plane are as follows; Step 1: Select the focusing point for the ROI.
Step 2: Select another four points on the virtual plane so that they surround the focusing point.
Step 3: Calculate the position of the four surrounding points in the images I l and I c by using (1) and (2).
Step 4: Calculate the coefficients b li (i = 1 ∼ 8) by using the four corresponding point pairs.
Step 5: Select another four points on the virtual plane B using the same manner as step2.
Step 6: Calculate the position of the four surrounding points in the images I r and I c by using (3) and (2).
Step 7: Calculate the coefficients b ri (i = 1 ∼ 8) using the same manner as step 4.
Step 8: Produce images I lc and I rc using the above coefficients The object is detected at the focusing point when we have no disparity in the ROI area between I lc and I rc . The disparity is evaluated by using an optical flow technique.

Experiments and discussion
A web camera, HD WEBCAM C270 (Logicool), was used in the experiment. The image size was 1280 × 720. We used two laser range finders, GLM70000 Professional (BOSCH), for determining 3D position a point in global coordinate system.

Camera calibration
We installed three cameras, and the center camera was taken as the origin of the global coordinate system. Figure 1 shows a measurement environment of the 3D coordinates. Three cameras were arranged at intervals of 10 cm. We measured 3D coordinates of a corner point on a checkerboard plane using two laser range finders with baseline length of 233 cm as shown in Fig. ??. Changing the position of the checkerboard, we observed many points on it with measuring distances from the positions A and B, respectively. The 3D coordinates of the point K(x = x k , z = z k ) were obtained from the length of three sides of the triangle ABK. Avoiding rank reduction in calculating the coefficients in the perspective transformation, we selected 19 independent points. The coefficients of  the 3D-2D perspective transformation for each camera were calculated from them. Thus, the camera calibrations were done. The coefficients are listed in Table 1.

Scanning the pre-determined ROIs
We predetermined nine focusing points Q 1 to Q 9 on an elliptical path, and scanned there by changing the coefficient set of the perspective transformation as shown in Fig. 2. There were persons at points Q 1 , Q 3 and Q 5 , and no one existed in the other six places. We theoretically calculated nine sets of the coefficients b li (i = 1 ∼ 8) in (4) and b li (i = 1 ∼ 8) in (5) for prefocusing at the ROIs. One of them are listed in Table 2. Figure 3(a) shows the color composite image between the originally observed images, in which the left image is assigned to red and the right image is assigned to green and blue. We can see that disparities of the objects depend on their distances from the camera. Figure 3(b) shows the result of focusing at the point Q 1 (x = 0, z = 300) using the theoretically generated coefficients of the 2D-2D perspective transformation. Only the center person at the point Q 1 had no disparities, though the other two persons at another position had disparities.
To determine whether the person exists in the focused ROI, we evaluated the disparities between the transformed stereo pair images by using the 1D optical flow. In the TDSM, three cameras were installed at the same height so that stereo pair images had only horizontal disparities. Therefore, we calculated the 1D optical flow. We detected the person when the optical flow was positive and less than 1. Figure 4 shows the result of the evaluating disparities. White areas were evaluated as ones where the condition above was satisfied. In Fig. 4, we can see that the center person at the pre-focused point Q 1 was well extracted, but the other two persons at points Q 3 and Q 5 were not extracted.     ities after the focusing are indicated in the left-hand side and person detected on the right. In Figs. 5(b) and (c), persons at the focusing points Q 3 and Q 5 were well extracted, respectively, though no person was detected at Q 2 and Q 8 as shown in (a) and (d). White areas seen in Figs. 5(a) and (d) were considered as noises due to a periodic pattern in clothes.

Determination of the ROI size
To design the person detection system using the proposed method TDSM, we experimentally obtained the relation between the camera interval and the size of the ROI with extension in both x and z directions (see Fig. 2). We defined the extension as an area in which disparities between images captured by the left and the right cameras, respectively, were less than one pixel. The disparities were evaluated by the 1D optical flow. The camera intervals were 10 cm, 20 cm and 35 cm. We prefocused at a fixed point (x = 0, z = 400cm). For measuring the 1D optical flow, we used a random dot pattern on a plane. We changed its distance from 350 cm to 450 cm with 10 cm steps. In order to evaluate disparities accurately, a watching window was set on the random dot pattern as shown in Fig. 6. The ROI was set along the vertical line crossing the center of the window. The 1D optical flow were evaluated all over the window, and were cumulated vertically for getting the stable distribution. The ROI extension was determined from the position at which the optical flow exceeded one pixel in length. We repeated the procedures with changing the distance and obtained the extension in the depth direction. Figures 7 show the distribution of the 1D optical flows for camera intervals of 10, 20 and 35 cm, respectively. The vertical axes show a kind of goodness of the focusing, where smaller disparities give us larger values. Areas in which the disparity is within 1 pixel length are indicated by yellow tiles. From our definition, the disparity is less than 1 pixel, we obtained the size of the ROI as 117 × 20 cm 2 for camera interval of 10 cm, 100 × 10 cm 2 for interval of 20 cm, and 37 × 10 cm 2 for interval of 35 cm, respectively, where, the actual depth size for camera interval of 35 cm was estimated as less than 10 cm. Considering that the optical flow can be measured more accurately, the ROI is smaller than our estimation. A new definition of the ROI will be required. Estimated ROIs shown in Figs. 7 may have some distortions in shape. We consider that the distortion comes from the miss-arrangement in camera directions.

Conclusions
In this paper, we proposed a three-dimensional spatial scanning method, TDSM, to detect an object or a person existing at any 3D region by using multiple cameras without a stereo matching process. In the proposed method TDSM, a pre-focusing was performed by applying an appropriate 2D-2D perspective transformation to a stereo pair images so that the object at the pre-determined region had no disparity between the stereo pair images. To extract the object, the TDSM evaluated the disparities by using an 1D optical flow technique. TDSM performed a camera calibration using a 3D-2D perspective transformation to freely scan the 3D space. Once the calibration was performed, the TDSM theoretically generated sets of coefficients for the prefocusing at any ROI. In addition, we experimentally obtained a relation between the camera distance and the ROI size. A  future area of the study includes development of more complete 3D scanning system using multi-camera system and its application to a pedestrian tracking system.