Visual Odometry from Floor Images by using Accelerated KAZE Features

For the estimation of self-position in an autonomous robot, there is a requirement to measure the two-dimensional movement. Therefore, a self-position estimation method using the floor image photographed with a CCD camera is proposed. In this paper, the self-position is estimated by detected keypoints from the image photographed before and after movement. AKAZE is used for the extraction of the features. Using the extracted existing frame by the past movement prediction, the movement position of 1 frame ahead of each keypoints can be predicted. Therefore, using the movement prediction of the feature point, in comparison with feature matching method, a method to speed up processing is proposed.


Introduction
For the estimation of self-position in an autonomous robot, there is a requirement to measure the two-dimensional movement.The measurement method includes the use of collected odometry wheel data by using a rotary encoder.From the uses of odometry wheel data, the movement can be determined based on the quantity of rotation of the right and left wheel.However, it is thought that an error occurs to the movement because of the changes of the wheel diameter as dirt attaches to the wheel due to the influence of the wheel sliding (1) and contact with the floor.Hence, without using the odometry wheel data, a self-position estimation method using the captured image is proposed.Self-position estimation method using captured image of an autonomous robot includes the uses of template matching method and the uses of keypoints.In the present, some examples of template matching method uses in self-position estimation is Nagai's Downward Facing Camera method (2) , Savan's Ground Vehicle method (3) and Andrew's Real-Time Stereo Visual Odometry method (4) .On the other hand, the method using template matching have a need in controlling the illumination of light.When the changes of color occur because of the disturbance of illumination, image matching cannot be performed on the floor image (5) .Thus, a selfposition estimation method by using an effective keypoints method on the color changes is proposed.Applied examples of the self-position estimation using the keypoints method include research from Mark (6) , Chen (7) , Raul (8,9) , Stephen (10) , Brian (11) , Kurt (12) , Timothy (13) , Christian (14) , Mikael (15) , and Julian (16) .In order to detect the keypoints, studies from Mark uses the chessboard corner, then uses Lucas-Kanade method for the determination of keypoints (6) .However, it is necessary to place a chessboard in the floor and cannot be applied for real environment uses.The studies from Chen, Raul, Stephen, Brian, Kurt, Timothy, Christian, Mikael, and Julian photograph landmarks such as a building or a structure, then extracted the keypoints, corners or edges of the landmark's area using descriptor such as SIFT, SURF, FAST or ORB.The feature of an image is extracted from the image using descriptor (7)(8)(9)(10)(11)(12)(13)(14)(15)(16) .Furthermore, this method corresponds and tracks the keypoints by comparing the details on both before and after movement of the image.However, since the method uses keypoints from landmarks, there is a possibility that the landmark keypoints will lose sight due to the environment lighting or disturbance.In addition, the calculation cost for the feature data is required (2) .Therefore, a highly competent in changes of brightness and rotation of AKAZE's feature is used.Regarding this, a method using only the keypoints of the floor image and the prediction of the keypoints' coordinate of the afterward frame from the movement result of the previous frame is proposed.Moreover, from the joining result of the keypoints extracted, this method can reduce the calculation cost.

System Summary
For the measurement of the movement, a monocular camera pointed perpendicularly to the floor is use.As shown in Fig. 1, the image is photographed as the camera moves parallel to the floor.When moving, the self-position estimation is performed and constructed from the acquired image photographed continuously.Next, from the keypoints obtained from two consecutive photographed frames and from the image, the feature keypoints is extracted.The keypoints of the two images is matched and the correspondence point between the images is determined.As shown in Fig. 1, as the camera faces perpendicularly to the floor, when the height between the camera and the floor is d, the half angle of the camera is  and the number of pixel in the x direction of the image sensor is w, and when the considerate point (x f , y f ) moves to (x f+1 , y f+1 ), the movement of x direction and y direction can be express as respectively and shown in the following expression.

Prediction of Next Frame
For the self-position estimation using the keypoints, when using feature quantities obtained from matching method is performed, calculation cost is necessary.In conjunction, without using the feature from the keypoints, using the movement calculated from the current frame and one frame before it, the next position of the keypoints is predicted.A method reducing the calculation cost corresponds to the keypoints obtained from the photographed image of the after movement is also proposed.This method uses the movement of the obtained current frame and the movement of the one frame before it to estimate the next frame.When there is inertia on the camera capturing the floor image as the camera move gently parallel to the floor, and when the frame rate is assume sufficiently fast, the gap between the movement obtained from the current frame and the next frame can be considered as the gap between previous one frame and the current frame.Therefore, as the camera moves on the x-axis and y-axis and when the one frame before, the current frame, and the next frame after is the obtained considerate point, the coordinate of the existing frame keypoints can be define as (x f , y f ), the movement in x-axis and y-axis between current frame and one frame before can be define as vx f and vy f respectively, the angle of rotation when the rotary motion is perform can be define as  f , the movement position in x-axis and y-axis of the keypoints in the next frame can be define as x′ f+1 ， y′ f+1 correspondingly and can be expressed in the following expression.
In this experiment, as the movement of x-axis in a forward direction is considered, the prediction can be done by the following expression.
After the prediction, the distance of the prediction point and the keypoints of the ground truth is measured.The correspondence predicted keypoints of the current frame is then associated with the nearest keypoints.On this time, the predictive point outside the next frame is excluded.Thus, it is possible that the estimated predictive point result of the next frame movement position is correspondence to the original coordinate and also possible for the unnecessary estimated point corresponds to the original coordinate.From the above explanation, without using its feature, the extracted keypoints of the current frame and the next frame can be associated.(1)

Remove Incorrectly Related Points
Current keypoints can relate to next flame keypoints without compute their features shown by section 2.2.However, the proposed method sometimes relate to incorrectly keypoints because of some current frame keypoints may vanish in next flame if lighting changed.These incorrectly related points are effected to measurement results.Our proposed method use standard deviation for remove word related points.Firstly, calculate all x-axis and y-axis movement using all related points form current and next flame.Secondly, calculate mean and standard deviation from all x-axis and y-axis movement.Thirdly, define average of x or y axis movement as a m , define standard deviation of x or y axis as s m and define measured movement using one related points as m, proposed method can detect correct related point from following threshold of a m -s m ≦ m ≦ a m + s m .Finally, define average of x or y axis movement as a n , define standard deviation of x or y axis as s n and define measured movement using one related points as n, proposed method can detect correct related point again from following threshold of a n -s n ≦ n ≦ a n + s n .Doing so, measurement result can enhance precision.

Experimental Environment and Method
For the experiment setup, STC-SB32POEHS (produced by OMRON SENTECH) was used for the CCD camera which capture the floor image, moreover MV-0614N (produced by NS-Lighting) was used for the lens which attached to CCD camera.The camera was installed facing downward 100 mm vertically with floor.The proposed system used light shielding plate for box and box's width is 296 mm and measure 236 mm in depth.The box is installed measure 3 mm in gap from floor.In addition, proposed system used linear guide for move system linearly.For the lighting, two OPB-20015W2 (produced by OPTEX FA) was used.The direction of radiation of the LEDs are parallel to the floor and towards to the center of image which can illuminate asperity and scratch of the floor.In addition, two LEDs illuminate are installed on both side of the box.Figure 2 shows experimental environment of our proposed system.For a subject, P-72 (produced by TAJIMA ROOFING) of the floor tile was used.Figure 3 shows captured floor image using proposed system and Fig. 4 shows detected keypoints from captured floor image using AKAZE descriptor.In addition, for the PC, HP Compaq Elite 8300 CMT (produced by Hewlett-Packard, Intel Core i7 3770 used) was used for compute and estimate move.MLS-50-1080C-2000 (produced by MICROTECH LABORATORY) was used for the wire encoder which measured ground truth.This experiment took pieces of floor image which we photographed every 1 mm for the x-axis direction by using wire encoder.Images was chosen and tested in 3 kinds of movement models, equal in speed and includes sigmoid function which accelerates and decelerates.Below shows the 3 kinds of movement models which was experimented.Secondly, computed features from pictures and keypoints only of first time to get measurement result.Thirdly, features and compute movement is matched.Fourthly, predicted next flame movement using matching result.Finally, keypoints from after and next flame is detected and matched each keypoints using predicted result.Figure 5 shows captured floor image including line which described by computed system movement between before and after moved picture.In Fig. 5, green lines show correctly related points and red lines show incorrectly related points.In addition, Fig. 6 shows the result of removing the selected lines from Fig. 5 which described as incorrectly related points by using standard deviation.
Equal speed model estimated movement 150 times, and sigmoid speed model estimated model 190 times.The mean of the movement every flame was calculated.In addition, the mean estimate time every movement models was measured.

Experimental Results
Relations of the movement with a flame of 3 models each is shown in Fig. 7(a) to (c).In addition, the standard division with the processing time of 3 experimented models is shown in Fig. 8(a) to (c).According to Fig. 7(a) and (b), both results of measurement move are follow-up to ground truth.However, according to Fig. 7(c), only AKAZE features method could follow-up to ground truth, but AKAZE features and estimate method cannot follow-up to ground truth.This problem is thought to be causally related to previous flame's result.Proposed method estimated to move 0 mm in next flame.Our proposed method related incorrectly keypoints because of some keypoints around estimated point.To solve this problem, if the estimated point have AKAZE features, check features around the estimate point will promise a precise estimate result.According to Fig. 8(a) to (c), each model of processed time are faster than conventional method.The experimental results show if odometry robot move smoothly, our proposed system possible to estimate movement follow-up to ground truth.
I) 200 mm by equal speed model, 4 mm interval per 1 flame II) 10 mm by gain 1.25, sigmoid function model III) 10 mm by gain 1.75, sigmoid function model Firstly, the proposed method detected keypoints from before and after the pictures move using movement models.