Smartphone Apps for Visually Impaired that Detects Obstacles and Obtains those Distances

The authors have been developing a smartphone app to support for visually impaired walking when they go out. This app has functions to detect obstacles from RGB camera image that hinder walking and to alert dangerous situation to the visually impaired. It uses CNN, a kind of deep learning technique, to detect obstacles, but it can detect only one obstacle from the image. Now we are trying to implement a multiple obstacles detect function into this app. This paper deals with a method to obtain the distance to the obstacles detected from the multiple obstacles detection.


Introduction
In Japan, as of 2016, there are approximately 312,000 visually impaired people who have a certificate of the physically disabled. It is predicted that the visually impaired will increase year by year as the population increases and ages (1) . Such visually impaired people find it very inconvenient in their daily lives, especially when walking. Investigation results indicate that about half of walking accidents are collision accidents. It will be helpful that some tool to detect properly the obstacles in front of their walking. We developed smartphone apps that detect obstacles using CNN (Convolutional Neural Network), a kind of Deep Learning (2)(3) . Since this method only determines whether there is an obstacle or not on the image, it is not possible to know where the obstacle is. Therefore, this method the distance to the obstacles cannot be obtained. Currently, we have been trying to implement the function that detects multiple obstacles at the same time and obtains the distance to each obstacle into the smartphone app. This paper shows how to measure distances to the detected obstacles using the tilt angle and the view angle of the smartphone camera, and the usefulness of the method will be shown by experimental results. Fig.1 shows a flow diagram of this system. A visually impaired wears a smartphone equipped with a camera at around chest height to take RGB images of forward. This system works as an application program on the smartphone. This app recognizes forward situations from taken RGB images and classifies them into some categories by using CNN. If it is recognized as an obstacle that disturbs walking, this system notifies it to the user by sound and vibration.

Obstacle Detection
There are many kinds of obstacles that disturb walking on road. In this study, we selected the following basic situations that should be detected: CNN learns by using a large amount of supervised training images, and given an input image, it can classify an object included in the images into a category. The final layer of CNN is a Fully Connected Layer, and the ReLU function is used as activation function for each neuron unit. The most likely candidate neuron unit outputs a highest value. The output value of each candidate neuron unit to be detected objects is converted to a value of 0 to 1 by the Softmax function where is the output value of i-th candidate neuron unit, N is the number of candidates. The value can be used as a probability for i-th candidate.
This study makes efficient use Inception-v3 which is already learned CNN model. Inception-v3 was also developed by Google and was learned to classify images into 1,000 classes for the image identification task of ILSVRC (4) . This study uses only the structure of the Inception-v3 model that has abilities of such good identification and re-learned the CNN model to recognize the specific obstacles that are needed to detection for walking support. Fig.2 shows a diagram of the re-learning flow, where TensorFlow is an open-source framework of machine learning developed by Google.

Experiments
The experiments were conducted during the day in city area. The experimenter kept a smartphone at chest position about 140 cm height from the ground, did not fix the angle of it. This experiment verified whether the app could detect target obstacle or situation while walking. Fig. 3 shows screenshots of situations in which the classification succeeded. All these situations, (a) sidewalk, (b) crosswalk, (c) bicycle and (d) stairs, were classified successfully with high probability values greater than 0.8. The specification of smartphone is shown in Table 1. This specification is a middle level model in 2017. The app can perform detection processing 3 to 4 times per second.
We prepared 400 to 2000 images and divided them into two groups, 80% for training and 20% for test. The training images had undergone image processing for each category and were used to make a CNN model by re-learning. The number of iterations is 4000 times. Table 2 summarizes the experimental results. The correct answer rates are over 80% in any category, which shows that it is a good model.

Why Multiple Obstacles Detection
The basic CNN mentioned above only determines if there are obstacles on the image, so you can't know where the obstacles are. If multiple obstacles are in front, this method will detect only one of the most likely obstacles. Even if there is a danger due to other obstacles, it will be ignored. It is necessary to detect multiple obstacles at the same time so as not to overlook the danger.

YOLO
YOLO (5) is one of the object recognition algorithms using CNN and is named from the acronym of "You Only Look Once". It can recognize multiple objects and acquire the area on the image at the same time. It has a feature that the processing time is shorter in compared with other similar methods using the window sliding approach or region proposal method. Fig. 4 show an example of multiple obstacles detection using YOLO.

Aims
By using YOLO, it is possible to know the position of the recognized each obstacle on the image. If you have the physical information of the smartphone at the time of shooting, you can calculate the distance to the obstacle. This chapter describes the method and shows its effectiveness by experiments.

Obtaining Method of Distance to the Obstacles
See Fig. 5, which shows the parameters for calculation. When the height at which the smartphone is held is ℎ and the angle from the vertical direction to the obstacle is , the distance to the obstacle can be calculated by the following equation

= ℎ tan
(3) Using the tilt angle of the smartphone, the angle of view of the smartphone camera and obstacle position on the image, the angle can be calculated by the equation where

Experiments
We had experiments to verify that the distance to an obstacle can be obtained correctly by this method using two types of smartphones with different angles of view shown in Table 3. As shown in Fig. 6, the acquired image was divided into four in the y-axis direction, and the distances of the five points from (A) to (E) were calculated using Eq.
(3) and Eq. (4). The tilt angle of the smartphone was changed from 0 degree to 90 degree in 10 degree increments, and the calculated and measured distances of each angle were compared. The height ℎ was fixed at 1.2m assumed as a chest position. Table 4 shows the results of using the smartphone A.  Fig. 7 shows result graph of them. The calculated value expresses a tendency of the measured value, and it is considered that the calculated value is within acceptable range although there are some errors.
The experimental results of using the smartphone B are shown in Table 5 and Fig.8. The angle of view and image size of this smartphone are different from the smartphone A mentioned above, but it also shows good results as well.

Conclusions
This paper explained the method of obtaining the distance to the obstacle using the acquired image by smartphone and showed that the proposed calculation method is appropriate by the experiments. If the actual distance was within 2m, the average error was 7cm, and the maximum was 18cm. Although there were such slight errors, it was considered to be within an acceptable range when it is assumed that the visually impaired will be notified of the existence of an obstacle.
As a future prospect, we plan to implement a function to recognize multiple obstacles by applying YOLO etc. into the smartphone application and obtain the distance to the obstacle using its position on the image.