Method for Acquisition of Camera Document Images ’ Distorted Gradient

Dewarping of camera document images has attracted a lot of interest over the last few years since warping not only reduces the document readability but also affects the accuracy of an OCR application. In this paper, approach for efficient getting distorted gradient of camera document images is presented. The gradient of text lines can be got by scanning black pixels’ number in text. For avoiding only getting error gradient, the image is divided into 6 zones. In each zone, a gradient is found out. Then, the dewarping can be achieved by projective transformation. Experimental results on several document images demonstrate the robustness and effectiveness of the proposed technique.


Introduction
Non-linear warping is often observed in document images, when captured by camera.Text in such cases is strongly distorted and influences the performance of further processing, since the contemporary OCR systems cannot handle distorted text.
Many different approaches have been proposed for document image dewarping [1][2].These approaches can be classified into two broad categories based on [i] 3-D document shape reconstruction [3][4][5][6][7][8] and [ii] 2-D document image processing [9][10][11][12][13][14][15][16][17][18][19][20]26,27].Our work is related to the second category.Previous approaches of the second category are described in the following: In [26], the range rectified from a distortion picture may be specified.So this technique is that changing the outer skeleton of distorted image into rectangle one which has parallel before behind borders and parallel right left borders.However, by this technique, since there are restrictions that four sides before and behind the whole printed matter or the right and left of a print sheet are contained in a photography picture, if some printed matter is contained in the photography picture, it cannot be rectified.
[27] propose a method for document dewarping which has inclination of character strings using the characteristics of characters and interlinear spaces of the document.Long continuous branches, which define interlinear spaces of the document, are not parallel to each other in distorted picture.Find the horizontal vanishing point according the characteristics of characters, and vertical vanishing point by many vertical space bars between characters.The dewarping method is that make the straight lines which go to vanishing point parallel to each other.But if the interlinear spaces of the document are very narrow, the method does not work.
In this paper, we propose an approach for efficient getting the horizontal vanishing point of camera document images, which is directly applied to the 2D space, and can be used in document which has extremely narrow interlinear spaces.First step, divided the image into 6 zones.In each zone, rotate a short straight line on the point which it passes through; count the black pixels' number on the line.Second step, move the point and judge the distance between the point and characters according the variation of black pixels' number.Third step, when the point is close enough to the characters, rotate the line to make it parallel to characters.With the six gradients, calculate the horizontal vanishing point.Experimental results on several document images demonstrate the robustness and effectiveness of the proposed technique.The remainder of the paper is organized as follows.In section 2, the proposed technique is detailed while experimental results are discussed in Section 3. Finally, conclusions are drawn in Section 4.

Proposed Method
This paper is focused on horizontal distortion compensation when there are horizontal and vertical distortions in binary image.In this paper, we propose an approach for efficient getting the horizontal vanishing point of camera document images, which is directly applied to the 2D space, and can be used in document which has extremely narrow interlinear spaces.First step, divided the image into 6 zones.In each zone, rotate a short straight line on the point which it passes through; count the black pixels' number on the line.Second step, move the point and judge the distance between the point and characters according the variation of black pixels' number.Third step, when the point is close enough to the characters, rotate the line to make it parallel to characters.With the six gradients, calculate the horizontal vanishing point.Experimental results on several document images demonstrate the robustness and effectiveness of the proposed technique.

Detecting black pixels
As shown in Fig. 1, the number of black pixels in the zone is detected by a scanning line which rotates on a point A. As shown in Fig. 2, the black pixels which are in a certain distance d from the line are taken into account and they are marked as ○.Others are not and these pixels are marked as black points.
Fig. 1 Rotate the line to detect pixels number.Fig. 2 Calculate the pixels' number.

Getting the gradient of inclination character strings
There are 2 steps in getting the gradient of warping document image.The 1st step is judge the position of point, to find out whether the point is close enough to text.If not, the 2nd step is approach it to the text.When the point is close enough to text, rotate the line to parallel to the text line.Then the gradient of line is the gradient of distorted text line.

Step 1.Judge the Distance between Point and the Text
At beginning, it should be judged that whether the point is close enough to text.There are all three situations between the point and text.Accordingly, there are three relationships between scanning line angles and detected pixels' number.As shown in Fig. 3, if the point is far away from the text, there will be no pixel detected when the line rotates 180 degrees as shown in Fig. 4.As shown in Fig. 5, if the point is a little closer to the text, there will be some pixels detected in some scanned degrees as shown in Fig. 6.As shown in Fig. 7, if the point is in the text, there will be some pixels detected in all scanned degrees as shown in Fig. 8.In this situation, if there is enough space between the strings, the horizontal vanishing point can be gotten.But if the space between the lines is very narrow, the point should be moved up to above the first line or down to below the last line.Step 2. Approach the Point to strings Move the point, if it is moved away from the text, the degrees that pixels can be detected would decrease.Try to move the point to four directions and scan the pixels' number each time.Move to the direction where the degrees increase.A point which is close enough to text can be got out by repeating this operation as shown in Fig. 11.The relationship between degrees and pixels' number is shown in Fig12.

Experiment results
To verify the validity of the proposed method we take four set of images, SET-1, SET-2, SET-3,SET-4.SET-1 shows document with different character sizes.Two examples document images from the SET-1 are shown in Fig. 13       But error also happened as following.As shown in Fig. 17 the line has been drawn in the direction of the characters which have no relationship with each other.The gap between characters in English text is much bigger than that in Japanese text.In the case of Fig. 18, because in a certain angle, character does not exist by chance, error happens.But because the degree of error line is obviously different to correct one, correct degree can be judge easily with another 5 degrees.

Conclusions
In this paper, we propose a method for getting the horizontal vanishing point of distorted camera document images with inclination of character strings.Our experimental results show that the proposed method can get the inclination gradient of the document images well and improve the method proposed in [27].In the next step, we will dewarp the image according the horizontal vanish point.

Fig. 3
Fig.3 Point is far away from text.

Fig. 5
Fig.5 Point is a little closer to text.

Fig. 6
Fig.6 Pixels can be detected in some degrees.

Fig. 7
Fig.7 Point is in the text.

Fig. 8
Fig.8 Pixels are detected in all degrees.Step 2. Approach the Point to strings

Fig. 9
Fig.9 Point is moved away from text.

Fig. 10
Fig.10 Pixels can be detected in less degrees.

Fig.11Point is
Fig.11Point is move right close to text.
[a] and 13[b].SET-2 shows document with extremely narrow space between lines.An example document image from the SET-2 is shown in Fig.14.SET-3 shows document with English characters.An example document image from the SET-3 is shown in Fig.15.SET-4 shows document images with different incline gradients.Two example document images from the SET-4 are shown in Fig.16[a] and 16[b].The 4 SET experiments are successful.

Fig. 14
Fig.14 Interline spacing of text is very narrow.

Fig. 16 [
Fig.16[b] Incline gradient of document is 35°.But error also happened as following.As shown in Fig.17the line has been drawn in the direction of the characters which have no relationship with each other.The gap between characters in English text is much bigger than that in Japanese text.In the case of Fig.18, because in a certain angle, character does not exist by chance, error happens.But because the degree of error line is obviously different to correct one, correct degree can be judge easily with another 5 degrees.

Fig. 17
Fig.17Error happens if gap between characters is big Fig.17Error happens if gap between characters is big