Study of Innovating Threshold Value Processing for Character Detection in Scene Image

Detecting the existence of character in a scene image using character shape features has been proposed until now. But many edges of scene are similar to features of character and cause false detection. It is means that using character shape feature only has a limit to improve the accuracy. And, there is a method that focuses on the environment in which the character exists. It is possible to improve the detection accuracy by using this method. But there is a case in which the leak of detection occurred because character area and non-character area cannot be separated accurately. This happening is thought due to the threshold processing. Therefor this study uses the Sauvola method as the threshold processing. Simulation shows the comparison between the proposed method and conventional one.


Introduction
Character detection in scene image is a process to detect the character from scene image taken with digital camera.Scene image includes a variety of information.In particular, character is a useful information in terms of human lives.We can easily get the scene image by the widely spread digital camera and mobile phone with a camera module.We can get useful information by using the acquired character in the scene image.Conceivable application example is takes a photo of billboards and signs at a strange place, and detect the character information.Then use the detected result to getting the translation or access the web page to get the relevant information.
In this case, the accuracy of character recognition is important.It is thought that the extraction of character without other objects could contribute a good accuracy to character recognition.The color and brightness of character are various depending on the surrounding circumstances.In addition, font size and shape are also has a diversity in the real environment.Character detection method have been proposed by many researchers (1) (2) .Zhu et al. proposed a hybrid shape feature method to detect the character.But many edges of scene are similar to features of character and cause false detection.It is means that using character shape feature only has a limit to improve the accuracy.It is very difficult to detect the character-only area from an image in which has a complex background, the effective method has not been established.In fact, when people recognize the character from a scene image, in most case, they pay attention to the object in which the character might appear on it before they focus on the character itself (3) .Furthermore, Kunishige et al. proposed a method uses not only shape feature of the character but also the feature of the surrounding environment (3) .The improvement of detection accuracy has been confirmed when using the features of the surrounding environment.But the detection leakage of character is still occurred when the separation of the character area and the non-character area was not performed exactly.We think that this detection leakage is comes form the loose threshold processing.In this paper, in order to improve the threshold processing, an innovative method is proposed by using the Sauvola.Simulation shows the comparison between the proposed method and conventional one.

Principle 2.1 Outline of system
This study creates a binary image by executing threshold processing to the scene image.After that, labels connected components of the binary image.Next, extract the features for each connected component and make a judgment whether it is a character area or not.Due to the resolution of image is limited, the small size component might not be read even it is a character area in fact.In this experiment, the component that the size is less than 10 pixels is treat as a non-character component.

Threshold processing method
Sauvola method used in this study is the adaptive threshold processing method.Adaptive threshold processing method determine the threshold by finding the pixel density and the standard deviation of around target pixel.Sauvola method determine a threshold by the following equation.
where m(x,y) is the average value of pixel density in the window of constant width in target pixel, and s(x,y) is standard deviation.k is a factor and decide the degree to which taking into account the effect of the standard deviation.This time, the value of k is 0.2 and value of R is 32.And connected components are created in accordance with the definition of the 4-connected component.

Character feature
Each connected component is determined whether the character area by the feature of character.Character feature is based on the shape of character.Table1 shows the feature of characters.

Threshold processing method
Scene image used in this experiment were collected from the image search of internet.We have specified the park and sign as a search term at the time of the search.We chose the image under the following terms from search results.

•Image size 480×480 or more •Color scene image •There is character in the image
In this study was manually classified character area and noncharacter area from image.If the character area and noncharacter area connected, it was a non-character area.And we were scale down so that image size is 640×480 or less.After detecting a character area was compared result of the conventional method and the proposed method.

Comparison of character detection result
In this study set the block size to 15×15 in both method.Fig1 and Fig2 is the output result of both methods.The output result when improved leak of detection by Sauvola method (Fig. 1).The output result when did not improved leak of detection (Fig. 2).Leak of detection has occurred in the area surrounded by the red frame in the figure.

Consideration
Table 1.Character feature.The character area and the non-character area are connected.It is determined that the non-character area as determined by character feature because it is not be accurately separated at this part.Leak of detect occurs in a part of character surrounded by red frame in the fig4.It is determined by the character features for each connected component.Therefore, a portion of such "dot" in the character is determined as the non-character area.

Conclusions
In this study, we adopted a new threshold processing method to Sauvola method to detect the character in a scene image.It reduced the detection leakage of character than the conventional threshold processing approach by using a Sauvola method.As future challenges, it is necessary to improve the separation accuracy between the character area and the non-character area.Furthermore, there is a need to improve the detection leakage of tiny part of character such as "dot".