A Three-Dimensional Spatial Scanning Method for Detecting An Object Using Multiple Cameras

A Three-Dimensional Scanning Method (TDSM) was proposed to detect an object or a person in a 3D Region Of Interested (ROI) without the stereo matching process. The method TDSM consisted of two processes; the ROI scanning in the field of view of the multiple cameras and the object detection in the ROI. The former was realized by the theoretically derived two sets of coefficients for perspective transformations; one canceled disparities on a plane between the left and the center images, and the other those on another plane between the right and the center. The ROI was formed at the intersection of the two planes. The latter was performed by detecting disparities between the left and the right images. One-dimensional optical flow was used for the detection. The method TDSM was successfully applied to detect a person walking in the field of view of the multiple cameras.


Introduction
Various types of autonomous cars have currently been attracted attention and developed and tested on actual roads.They always sense their surroundings to safely reach their destination.In the sensing process, sensors such as infrared lasers, cameras, and millimeter-wave radars (1) are used.Computer systems are also used to understand their outputs and to determine the next action.In fact, TOYOTA has already developed a practical Pre-Crash Safety System (2) .Camera-based systems have also been developed, i.e., SUBARU's Eyesight and HONDA's Intelligent Night Vision system (3) .These systems commonly obtain 3D road information not only to correct control but also to detect pedestrians entering the street (4) for avoiding car-human accidents.The systems, however, need a time-consuming stereo matching process (5) for getting 3D information.There are at least two approaches to solve the problem.One is to use a sub-system, such as GPGPU, to speed up the processing (6) .The other is to adopt a smart method to avoid the stereo matching process.The former solution is trivial and expensive, though the latter low-cost and flexible.The authors have proposed a such method named as the pre-focusing method (PFM) (7) to detect a pedestrian at a pre-fixed 3D ROI.In the PFM, we need calibrations for each pre-focused ROI.A smart method for avoiding the calibration has been required.
In this study, we propose a 3D spatial scanning method TDSM for random determination of ROIs.Once a camera calibration is performed, TDSM automatically generates sets of coefficients for the perspective transformation to produce the ROI at any point in the field of view of the multiple cameras.
Disparities between the left and right camera images are evaluated by using a optical flow technique (8) .We find an object or a person in the ROI when no disparity is detected in the ROI.

Methodology
The proposed method TDSM theoretically derives coefficients for pre-focusing in the 3D ROI for any point in the field of view of multiple cameras and determines whether the object exists there.

Creation of ROI
Suppose 3 cameras are linearly located and assume their optical axes are parallel with no relative rotation.Each camera has already calibrated and they have their own coefficients for perspective transformation as , where, ali (i = 1~11) means coefficients for the perspective transformation from a point (x, y, z) in the global coordinates to a pixel (p, q) in the image captured by the left camera, aci (i = 1~11) , (s, t), ari (i = 1~11) and (u, v) for the center and the right ones, respectively.When these cameras Cl, Cc and Cr observe a rectangular on a plane A at a distance, their images have some disparities among them.Suppose a perspective transformation from (p, q) to (s, t) as , where, coefficients bli (i = 1~8) are calculated from the four pair of vertices.Using the transformation, we can register the left camera image onto the center one.This means that an object on the plane above has no disparity between the center image and one captured by the left camera then transformed by (4).Thus, cameras Cl and Cc are pre-focused on the plane A. We set another plane B and apply the same process to cameras Cr and Cc with using a perspective transformation , where, bri (i = 1~8) are coefficients for the perspective transformation from (u, v) to (s, t).They cancel disparities of an object on the plane B between the center camera image and one captured by the right camera then transformed by (5).Now, we have two sets of pre-focused cameras Cl and Cc on the plane A, and Cr and Cc on B. Suppose planes A and B have an intersecting line, and suppose an object set on the line is observed by all cameras at a time.Let's Il, Ic and Ir be images captured by cameras Cl, Cc and Cr, respectively, and Ilc and Irc be transformed images from Il by (4) and from Ir by (5), respectively.The object observed in the image Ilc and that in Ic have no disparity between them, because the object is on the plane A. Similarly, the object observed in Irc and that in Ic have also no disparity between them, because the object is also on the plane B. In this situation, the object in Ilc and that in Irc have no disparity between them.
Only object set on the intersecting line of the planes A and B achieves the situation.Thus, the ROI is created on the intersecting line of the planes A and B.

Scanning the ROI
In the proposed method TDSM, the ROI is set on the intersecting line.For scanning the ROI, we have to produce two virtual planes A and B so that their intersecting line is on the point we want.The procedures for performing the pre-focusing using the virtual plane are as follows; Step 1: Select the focusing point for the ROI.
Step 2: Select another four points on the virtual plane so that they surround the focusing point.
Step 3: Calculate the position of the four surrounding points in the images Il and Ic by using (1) and (2).Step 4: Calculate the coefficients bli (i = 1~8) by using the four corresponding point pairs.Step 5: Select another four points on the virtual plane B using the same manner as step2.
Step 6: Calculate the position of the four surrounding points in the images Ir and Ic by using ( 3) and (2).
Step 7: Calculate the coefficients bri (i = 1~8) using the same manner as step 4.
Step 8: Produce images Ilc and Irc using the above coefficients.The object is detected at the focusing point when we have no disparity in the ROI area between Ilc and Irc.The disparity is evaluated by using an optical flow technique.

Experiments and discussion
We used a web camera, HD WEBCAM C270  (Logicool), as the stereo camera.The image size was 1280×720.We used a laser range finder, GLM70000 Professional (BOSCH), for measuring a point in global coordinates.

Camera calibration
We installed 3 cameras and the center camera was taken as the origin of the global coordinate system.The distance between the left camera and the center camera and the distance between the right camera and the center camera were 10 [cm].A corner point of a checkerboard was measured using a laser range finder as shown in Fig. 1.The coordinate zk and the coordinate xk of the point K in the global coordinate were obtained from the length of three sides of the triangle ABP.The board was repositioned and corner points were measured at a total of 19 places.The coefficients of the perspective transformation for each camera are listed in Table 1.

Focusing the pre-determined ROI
We selected three focusing points and theoretically calculated three sets of the coefficients for pre-focusing at these regions.One of them are listed in Table 2. Fig. 2(a) shows the color composite image between the originally observed images, in which the left image is assigned to red and the right image is assigned to green and blue.Three persons were at (x = 0, z = 300), (x = -50, z = 400) and (x = 100, z = 500).We can see that disparities of the object depend on their distances from the camera.Fig. 2(b) shows the result of focusing at (x = 0, z = 300) using the obtained  (b) Focused stereo pair images.Fig. 2. Color composite images before and after focued.coefficients of the perspective transformation.We can see that only the person at the center had no disparity while the other persons still had disparities.Fig. 3 shows results of focusing at the other regions.Persons at different regions were respectively focused and had no disparities.

Extraction of the focused area
To detect the person at the focused region, we evaluated the disparity by using the optical flow technique.A threshold of the optical flow was 0 or more and less than 1.Fig. 4 shows the result of the evaluation.White regions in Fig. 4 were evaluated as focused areas.We can see that the optical flow detects the person at the focusing ROI.We compared the calculation cost of the proposed method TDSM with that of the template matching.Fig. 5 shows the disparity map.To generate the map, we needed to perform the whole pixel matching.Furthermore, to detect the object from the disparity map, we needed to perform the template matching by using template images of various sizes.The calculation cost of the template matching was more expensive than that of the proposed method TDSM.

Conclusions
In this study, we proposed a 3D spatial scanning method TDSM to detect an object or a person in a 3D ROI without the stereo matching process.Once a camera calibration is TDSM theoretically generates sets of coefficients for the perspective transformation to produce the ROI at any point in the field of view of the multiple cameras.To detect an object at the pre-determined ROI, TDSM evaluates disparities between pre-focused stereo pair image by using an optical flow technique.Future area of study includes a development of a system which tracks the pedestrian detected via TDSM.

Fig. 3 .
Results of focusing at the other two regions.

Fig. 4 .
Fig. 4. Result of the evaluation by the optical flow.

Table 1 .
The coefficients of the perspective transformation for each camera.