Image Transformation Method to Refine Handwritten Characters Using Generative Adversarial Networks

It is easier for hand-written characters to convey the individuality of the writer than characters printed by a printer or the like. Even when digitization progresses, it is used when you want to communicate important information with your own thought to the other party. For this reason, knowledge, experience, and technology are important to write well-written characters. In this paper, we propose an image processing method that corrects handwritten characters that contain imperfections in local shapes such as balance of character, 'tome', 'hane', 'harai', etc. To well-balanced character form. Specifically, using a Generative Adversarial Network (GAN), simultaneously learn a generator that generates a beautiful character image from a poorly written character image and a discriminator that identifies a poorly written character image and a well-written character image.


Introduction
It is easier for hand-written characters to convey the individuality of the writer than characters printed by a printer or the like.Even when digitization progresses, it is used when you want to communicate important information with your own thought to the other party.On the other hand, in Japanese, it is possible to express characters by mixing a wide variety of characters such as hiragana, katakana, kanji, numbers, alphabets and the like.When describing it by handwriting, it is necessary to write those characters one by one in a well-balanced manner.For this reason, knowledge, experience, and technology are required to write well-written characters.The shape balance has local shapes such as 'tome', 'hane', 'harai' and so on.In this paper, we propose an image processing method to correct handwritten characters (poorly written characters) including those deficiencies into characters with good appearance (well-written characters).Specifically, using a Generative Adversarial Network (GAN), simultaneously learn a generator that generates a beautiful character image from a poorly written character image and a discriminator that identifies a poorly written character image and a beautiful character image.model used this time uses CNN to generate an image, and generates an image with good resolution.In Generator, dimension reduction is performed using an automatic encoder to extract local feature quantities.Also, average pooling is used for the output layer.

Related work 2.1 GAN
Generative Adversarial Network (GAN) is a wonderful generation network proposed in 2015 (3).GAN is a kind of generative model, and there is a generator and a discriminator at the time of learning.I n learning in the case of learning, in the learning of the discriminator, learning is performed as a binary classification model that determines whether the output of the generator obtained from the input image or actual data.In learning of the generator, learning is performed so that an output obtained from the latent variable causes the discriminator at that time to distinguish it from actual data.As a result, a generator is learned which outputs data such that the discriminator determines the generated data as actual data.By alternately repeating these learnings, the discriminator model and generator model can finally become accurate the model.Here, let pdata (x) be the probability distribution of real data x, and pdata (z) be the probability distribution of images input to the generator.The final goal is to have (x) = pdata (z) It is.The generator learns that the output value D (G (z)) of the discriminator gives a large value when it inputs its output G (z) into the discriminator.The discriminator performs learning so as to increase the output value D (x) for the actual data x and reduce the output value D (G (z)) for the generated data.These are expressed as the maximization problem of the value function V (D, G) as follows.

DCGAN (Deep Convolutional GAN
) is a GAN model using CNN proposed by Alec Radford et al.In 2015.CNN can efficiently generate high-resolution images.DCGAN not only incorporates CNN, There are various ideas.In CNN, downsampling is usually performed by maximum pooling.But in the discriminator model, it is replaced by the convolution layer of stride 2. Pooling is not used in the generator model.In the discriminator model, not all connection layers are used, but instead, global average pooling is used.This slows down the convergence and has the effect of preventing overlearning. (4)In the paper by Alec Radford The image of the bedroom was learned by DCGAN to construct a generation model.It produces an image that is visually indistinguishable from the visually generated data.As shown in the figure, the neural network used in the proposed method is a generator network that converts hand-written lower-hand characters into well-written characters.And a discriminator that identifies the converted image and the well-written characters image obtained by the generator.It consists of two parts of the network.In network learning, the process of learning the generator so that the discriminator parameters are fixed and the discriminator outputs beauty character judgment, and the generator parameters are fixed and the input to the discriminator is a well-written characters or converted image The process of learning the discriminator is alternately performed so that it can be determined correctly.Since the discriminator learns quickly, the converted image of the generator outputs well-written characters of different characters.Dimensional compression was performed using an auto encoder.So, in this research, we reduced the convolution layer of discriminators and let the generator learn first.This will allow the generator to generate an image that will mislead the discriminator well if the network is trained well.That is, the generator after learning can convert the poorly written character images into a well-written character images.

Experiments settings
In the experiment, images for learning were created using eight variations handwritten (eight characters) for 80 first-grade kanji characters in Japan and pictograms obtained from a commercial dictionary of well-written character images (poorly written character: 80 x 80 = 640 patterns, well-written character: 80 patterns).The image was rotated, scaled, and misaligned to increase the number of well-written characters and poorly characters to 10000 each.

Experiments results
Figures 4.2 and 4.3 show some beautified characters by correcting some types of poorly written characters.Fig. 4.4 shows a correction and beautification of multiple inferior characters of the same type.The generator's input is poorly written character images and the generator's output is well-written characters images.In Fig. 4.2, 'hane', 'hairai' and 'tome' are corrected and beautified.In Fig. 4.3, it is understood that correction is made to each lower-case letter one by one, and it is not just the replacement of the beautiful letters.In Figure 4.3, the characters of the same type are corrected differently.You can see that we are not replacing characters.

Figures 3 .
Figures 3.1 and 3.2 show the generator and discriminator models.As shown in the figure, the neural network used in the proposed method is a generator network that converts hand-written lower-hand characters into well-written characters.And a discriminator that identifies the converted image and the well-written characters image obtained by the generator.It consists of two parts of the network.In network learning, the process of learning the generator so that the discriminator parameters are fixed and the discriminator outputs beauty character judgment, and the generator parameters are fixed and the input to the discriminator is a well-written characters or converted image The process of learning the discriminator is alternately performed so that it can be determined correctly.Since the discriminator learns quickly, the converted image of the generator outputs well-written characters of different characters.Dimensional compression was performed using an auto encoder.So, in this research, we reduced the convolution layer of discriminators and let the generator learn first.This will allow the generator to generate an image that will mislead the discriminator well if the network is trained well.That is, the generator after learning can convert the poorly written character images into a well-written character images.

Figure 4 .
1 shows an example of an experimental image.