A Brief Review on Cycle Generative Adversarial Networks

Image-to-image translation is an important topic in the ﬁeld of computer vision. It aims to learn the mapping between input image and output image by training the datasets, and ﬁnally translates the image style from one domain to another. In terms of the form of translation, it can be divided into the translation between two domains and multiple domains from di ﬀ erent datasets. And it is also divided into pairs and unpaired by the training datasets. As a successful representation of the translation of an unpaired image between two domains, the CycleGAN model is of great signiﬁcance to the research and application. Starting from the application background of the CycleGAN model, this paper compares it with the basic deep learning model GAN, the paired image translation model pix2pix, other unpaired image-to-image translation models and multiple domains translation models separately, and gives their own model structure diagram. Then, the further application of the CycleGAN in other ﬁelds of computer vision is analyzed, especially in the ﬁeld of Person Re-recognition and face change. Finally, according to the actual running results of each model, the existing problems are analyzed and summarized. The further research work is given.


Introduction
Since the advent of Convolutional Neural network (CNN),deep learning as an implementation algorithm of machine learning, has been widely used in Computer Vision field, from the beginning of the recognition of handwritten, object tracking, speech recognition, face recognition, and through training of fruit images dataset, help person determine the type of fruit when given image about it, until Alpha Go, it is the first computer program to defeat a professional human Go player, the first program to defeat a Go world champion, and arguably the strongest Go player in history, making the deep reinforcement learning attracted more and more researcher's attention.
At the same time, Goodfellow et al. (5) proposed the famous model GAN (Generative Adversarial Networks) in 2014, which is considered to be the coolest idea in the field of machine learning in the past 20 years and bring new breakthroughs to the deep learning model.The key to GAN's success is the idea of an adversarial loss that forces the generated images to be, in principle, indistinguishable from real photos.This loss is particularly powerful for image generation tasks, as this is exactly the objective that many of computer graphics aims to optimize.
) UC Berkeley, Jun-Yan Zhu et al. (1) extended the application of GAN to the Image-to-image translation, the models pix2pix and CycleGAN have been proposed, and apply their method to a wide range of applications, including collection style transfer, object transfiguration, season transfer and photo enhancement, (2), (3) and then led to further studies in other fields by more researchers, such as Person Re-identification, (8), (13) Multi-domain translation, (10) game graphics model, and so on.

CycleGAN
In this section, we first briefly review the principle of Cycle-GAN, and then give a comparison with the GAN and pix2pix and other unpaired image-to-image translation models.

The background of CycleGAN
The task of image style translation is to change a particular aspect of a given image to another, that is to say, given any two unordered image collections X and Y, the algorithm learns to automatically "translate" an image from one into the other

Introduction
Since the advent of Convolutional Neural network (CNN),deep learning as an implementation algorithm of machine learning, has been widely used in Computer Vision field, from the beginning of the recognition of handwritten, object tracking, speech recognition, face recognition, and through training of fruit images dataset, help person determine the type of fruit when given image about it, until Alpha Go , it is the first computer program to defeat a professional human Go player, the first program to defeat a Go world champion, and arguably the strongest Go player in history, making the deep reinforcement learning attracted more and more researcher s attention.
At the same time, Goodfellow et al. [5] proposed the famous model GAN (Generative Adversarial Networks) in 2014, which is considered to be the coolest idea in the field of machine learning in the past 20 years and bring new breakthroughs to the deep learning model.The key to GAN s success is the idea of an adversarial loss that forces the generated images to be, in principle, indistinguishable from real photos.This loss is particularly powerful for image generation tasks, as this is exactly the objective that many of computer graphics aims to optimize.
) UC Berkeley,Jun-Yan Zhu et al [1] extended the application of GAN to the Imageto-image translation, the models pix2pix and CycleGAN have been proposed, and apply their method to a wide range of applications, including collection style transfer, object tranansfiguration, season transfer and photo enhancement, (2), (3) and then led to further studies in other fields by more researchers, such as Person reidentification, (8), (13) Multi-domain translation, (10) game graphics model, and so on.

CycleGAN
In this section, we first briefly review the principle of CycleGAN, and then give a comparison with the GAN and pix2pix and other unpaired image-to-image translation models.

The background of CycleGAN
The task of image style translation is to change a particular aspect of a given image to another, that is to say, and vice versa, (1) which can be divided into two situations: one is paired translation, and the other is unpaired translation.
For image translation in pairs, the commonly used models is pix2pix, (2) It can verify by example that pix2pix model has a certain universality, but also has certain limitation.First of all, the requirements of the training dataset of training samples must be in pairs, but in real life, if you want to collect a large number of data in pairs, it is a little difficulty.If the data quantity is little, and there will be a fitting, model training effect.Secondly, the image transfer achieved by this model is only limited to the transformation of different styles of the same image, such as color changes.At this time, how to use samples with different styles for training to complete the translation of image style is the basic design idea of CycleGAN model.

CycleGAN Review
As one of the most interesting and important deep learning achievements in 2017, CycleGAN has many application in the past two years.The original schematic and formula of the CycleGAN paper are as follows. (1), (8)un-Yan Zhun et al. (1) given two datasets {x i } (i = 1 to M) and y j ( j = 1 to N), collected from two different domains X and Y , where x i ∈ X and y j ∈ Y , the goal of CycleGAN is to learn a mapping function G: X → Y ,such that the distribution of images from G (X) is indistinguishable from the distribution Y using an adversarial loss.
CycleGAN contains two mapping functions G :X → Y and F : Y → X.Two adversarial discriminators D X and D Y are proposed to distinguish whether images are translated from another domain.CycleGAN applies the GAN framework to train the generative and discriminative models jointly.The overall CycleGAN loss function is expressed as follows: Where

Comparison with GAN
As show in Fig. 1, GAN model is composed of Generative Nets ( G) and Adversarial Nets ( D), that is to say, G is Generator and D is Discriminator.The main role of G is to translate the random noise z into the samples G (z), and the role of D, which is equivalent to a binary classifier, is to judge whether the input image is real or fake.If the real output is 1, others is 0, x is the real image.
During training Generative Nets (G) and Adversarial Nets (D) are dynamic game processes for each other.We train D to maximize the probability of assigning the correct label to both training examples and samples from G. We simultaneously train G to minimizelog (1 − D (G (z))), In other words, G plays the following two-player minimax game with the value function V (G, D): (5) min Here x represents the real image, z represents the noise which input to the G, G (z) represents the image generated by the G and D (x)represents the probability that D judge whether the image is true or not.
CycleGAN is based on the GAN model, and is composed of two GANs.Instead of inputting random noise, it inputs the images given by the user, which improves the ability of user control as well as the resolution and image quality.
Combined with the overall CycleGAN loss function expressed in Eq. 1, which have two discriminators and generators, it is a circular structure, so it's called Cycle.The framework as shown in Fig. 2.
given any two unordered image collections X and Y, the algorithm learns to automatically translate an image from one into the other and vice versa, (1) which can be divided into two situations: one is paired translation, and the other is unpaired translation.
For image translation in pairs, the commonly used models is pix2pix, (2) It can verify by example that pix2pix model has a certain universality, but also has certain limitation.First of all, the requirements of the training dataset of training samples must be in pairs, but in real life, if you want to collect a large number of data in pairs, it is a little difficulty.If the data quantity is little, and there will be a fitting, model training effect.Secondly, the image transfer achieved by this model is only limited to the transformation of different styles of the same image, such as color changes.At this time, how to use samples with different styles for training to complete the translation of image style is the basic design idea of CycleGAN model.

CycleGAN Review
As one of the most interesting and important deep learning achievements in 2017, CycleGAN has many application in the past two years.The original schematic and formula of the CycleGAN paper are as follows. (1), (8)un-Yan Zhun et al [1] given two datasets {x i } (i = 1toM ) and {y j } (j = 1toN ), collected from two different domains X and Y, where x i ∈ X and y j ∈ Y , the goal of CycleGAN is to learn a mapping function G: X → Y such that the distribution of images from G (X)is indistinguishable from the distribution Y using an adversarial loss.
CycleGAN contains two mapping functions G :X → Y and F : Y → X.Two adversarial discriminators D X and D Y are proposed to distinguish whether images are translated from another domain.CycleGAN applies the GAN framework to train the generative and discriminative models jointly.The overall CycleGAN loss function is expressed as follows: Where V GAN (D Y , G, X, Y )andV GAN (D X , F, Y, X) are the loss functions for the mapping functions G and F, and for the discriminators D Y and D X is the cycle consistency loss that forces F (G (x)) ≈ x and G (F (y)) ≈ y, in which each image can be reconstructed after a cycle mapping.λ penalizes the importance between V GAN and V cyc .For more details of CycleGAN, can refer to [1].

Comparison with GAN
As show in Fig. 1, GAN model is composed of Generative Nets ( G) and Adversarial Nets ( D), that is to say, G is Generator and D is Discriminator.The main role of G is to translate the random noise z into the samples G (z), and the role of D, which is equivalent to a binary classifier, is to judge whether the input image is real or fake.If the real output is 1, others is 0, x is the real image.
During training Generative Nets (G) and Adversarial Nets (D) are dynamic game processes for each other.We train D to maximize the probability of assigning the correct label to both training examples and samples from G. We simultaneously train G to minimizelog (1 − D (G (z))), In other words, G plays the following two-player minimax game with the value function V (G, D): (5) min Here x represents the real image, z represents the noise which input to the G, G (z) represents the image generated by the G and D (x)represents the probability that D judge whether the image is true or not.
CycleGAN is based on the GAN model, and is composed of two GANs.Instead of inputting random noise, it inputs the images given by the user, which improves the ability of user control as well as the resolution and image quality.
Combined with the overall CycleGAN loss function expressed in Eq 1, which have two discriminators and gen-

Comparison with pix2pix
The fundamental similarity between pix2pix and Cycle-GAN is that they translate one type of image into another.The difference between them is show in Fig. 3, pix2pix is to learn the mapping between an input image and an output image using a training set of paired data{x i , y i } (i = 1 to N), where the correspondence between x i and y i exists.
Similar to the CycleGAN, pix2pix is also based on the GAN model, works by training on pairs of images such as building facade labels to building facades, and then attempts to generate the corresponding output image from any input image you give it. (17)However, instead of inputting random noise, it input the image given by the user, whose framework as show in Fig. 4. Pix2pix uses the condition GAN, and the objective of it can be expressed as: (2) (3) In order to make a comparison, pix2pix need to train an ordinary GAN Simultaneously, only let D judge whether it is a real image.In addition, on the basis of GAN, the conditional feature is added, (3) that is, added the L1 loss function to the G network and an L1 loss was added to make the image of the source domain and the target domain as close as possible.
The overall pix2pix loss function is expressed as: Similar to the Eq. 2, G tries to minimize this objective against an adversarial D that tries to maximize it .
As described in section 2.2, CycleGAN training with unpaired data, consisting of a source set {x i } (i = 1 to M), x i ∈ X and a target set y j ( j = 1 to N), y j ∈ Y , and the input and output of the two images cannot have any relationship.Compared with pix2pix, it has higher scalability and applicability.The overall of its loss function is expressed as the Eq. 1.

Comparison with other unpaired Image-to-Image translation model
With regard to the unpaired Image-to-Image translation, besides the CycleGAN model, there are other model that have been proposed, such as DualGAN, (4) DiscoGAN, (12) and so on.All these model tackle the unpaired setting, effectively alleviate the problem of obtaining data pairs and only can learn the relations between two different domains at a time.
Zili Yi et al. (4) have developed a novel DualGAN mechanism, which enables image translators to be trained from two sets of unlabeled images from two domains.In their architecture, the primal GAN learns to translate images from domain U to those in domain V, while the DualGAN learns to invert the task. in order to avoid costly pairing, Taeksoo Kim et al. (12) address the task of discovering cross-domain relations given unpaired data.They have proposed a method based on generative adversarial networks that learns to discover relations between different domains (DiscoGAN).
It can be found that CycleGAN and DiscoGAN preserve key attributes between the input and the translated image by utilizing a cycle consistency loss.But unlike other approaches, CycleGAN does not rely on any task-specific, predefined similarity function between the input and output, nor do we assume that the input and output have to lie in the same low-dimensional embedding space.This makes the method a general-purpose solution for many vision and graphics tasks. (1)

Applications of CycleGAN
Due to the strong commonality of the CycleGAN framework, since its publication in 2017, it has attracted a lot of attention from image processing amateurs and researchers.In addition to simple application, CycleGAN has a wide range of important application in many fields, such as Person Re-Identification, (8), (13) Face Generation, (9) Multi-domain image translation, (10), (30) and so on. (19), (31)he framework is shown in Fig. 2.

Comparison with pix2pix
The fundamental similarity between pix2pix and Cy-cleGAN is that they translate one type of image into another.The difference between them is show in Fig. 3,pix2pix is to learn the mapping between an input image and an output image using a training set of paired data{x i , y i } (i = 1toN ), where the correspondence between x i and y i exists.
Similar to the CycleGAN, pix2pix is also based on the GAN model, works by training on pairs of images such as building facade labels to building facades, and then attempts to generate the corresponding output image from any input image you give it. (17)However, instead of inputting random noise, it input the image given by the user, whose framework is show in Fig. 4. Pix2pix uses the condition GAN, and the objective of it can be expressed as: (2) (3) In order to make a comparison, pix2pix need to train an ordinary GAN Simultaneously, only let D judge whether it is a real image.In addition, on the basis of GAN, the conditional feature is added, (3) that is, added the L1 loss function to the G network and an L1 loss was added to make the image of the source domain and the target domain as close as possible.
The overall pix2pix loss function is expressed as: Similar to the Eq 2, G tries to minimize this objective against an adversarial D that tries to maximize it.As described in section 2.2, CycleGAN training with unpaired data, consisting of a source set {x i } (i = 1toM ) , x i ∈ X and a target set{y j } (j = 1toN ), y j ∈ Y , and the input and output of the two images cannot have any relationship.Compared with pix2pix, it has higher scalability and applicability.The overall of its loss function is expressed as the Eq 1.

Comparison with other unpaired Image-to-Image translation model
With regard to the unpaired Image-to-Image translation, besides the CycleGAN model, there are other model that have been proposed, such as DualGAN, (4) DiscoGAN, (28) and so on.All these model tackle the unpaired setting, effectively alleviate the problem of obtaining data pairs and only can learn the relations between two different domains at a time.
Zili Yi et al [4] have developed a novel DualGAN mechanism, which enables image translators to be trained from two sets of unlabeled images from two domains.In their architecture, the primal GAN learns to translate images from domain U to those in domain V, while the DualGAN learns to invert the task. in order to avoid costly pairing, TaeksooKim et al [28] address the task of discovering cross-domain relations given unpaired data.They have proposed a method based on generative adversarial networks that learns to discover relations between different domains (DiscoGAN).
It can be found that CycleGAN and DiscoGAN (28) preserve key attributes between the input and the translated image by utilizing a cycle consistency loss.But unlike other approaches, CycleGAN does not rely on any task-specific, predefined similarity function between the input and output, nor do we assume that the input and output have to lie in the same low-dimensional embedding space.This makes the method a general-purpose solution for many vision and graphics tasks. (1)Applications of CycleGAN Due to the strong commonality of the CycleGAN framework, since its publication in 2017, it has attracted a lot of attention from image processing amateurs and researchers.In addition to common application directly,

Implementation
In the paper, (1) the author given an examples for the translation about horses and zebras, apples and oranges, photos and paintings.In addition to the version developed by the original author, this paper implements the Tensorflow version of xhujoy, (18) whose environmental requirements and running process as shown in Fig. 5.
Another example, as show in Fig. 8, Jack Clark, (16) collected the ancient map of Babylon, Jerusalem and London, then translated it via CycleGAN into a modern Google Map & Satellite View.Instead of the original author's running on GPU, Fig. 6 shows the translation results on the CPU.By observing the training process, we can see the translation results is getting better and better as the number of training increases, and once run 200 epoch as original paper set, it can be find that there are many images which are difficult to distinguish true from fake.

Simple application
First of all, in addition to the transformation of horses and zebras, apples and oranges, photos and Monet, summer and winter listed in the original paper, there are some peoples who came up with a variety of magical applications about it, (15) as shown in Fig. 7(left).By applying CycleGAN a cat can be translate into a dog, The first row shows the input image while the one below shows the translation results.
In addition, also as shown in Fig. 7 (right), only changing the datasets simply and then directly applying the CycleGAN model, we can translate a male person to female one or female to male (using datasets CelebA, and uniformly scale to 256×256).Of course, only face changing is considered here, and others, for example, hairstyles and clothes are not considered.

Person Re-Identification(Re-ID)
As a sub-problem of image retrieval, because of its important applications in security and surveillance, Person Re-ID has drawn lots of attention from both academia and industry. (13)lthough the development of deep learning and the emergence of many datasets, Person Re-ID performance has been significantly boosted.For example, the Rank-1 accuracy of single query on Market1501 (21) has been improved from 43.8% (22) to 89.9%. (23)The Rank-1 accuracy on CUHK03 (20) labeled dataset has been improved from 19.9% (20) to 88.5%. (29)owever, there still remain several challenges hindering the applications of Person Re-ID, one of which is the scarcity of datasets.To address this challenge, Person Transfer Generative Adversarial Network (PTGAN) was proposed by Longhui Wei et al. (13)   it has a wide range of application in many fields, such as Person re-identification, (8), (13) face change, (10) game graphics models and so on.

Implementation
In the paper [1], the author given an examples for the translation about horses and zebras, apples and oranges, photos and paintings.In addition to the version developed by the original author, this paper implements the Tensorflow version of xhujoy, (18) whose environmental requirements and running process as shown in Fig. 5.
Instead of the original author s running on GPU, Fig. 6 shows the translation results on the CPU.By observing the training process, we can see the translation results is getting better and better as the number of training increases, and once run 200 epoch as original paper set, it can be find that there are many images which are difficult to distinguish true from fake.

Simple application
First of all, in addition to the transformation of horses and zebras, apples and oranges, photos and Monet, summer and winter listed in the original paper, there are some peoples who came up with a variety of magical applications about it, (15) as shown in Fig. 7(left).By applying CycleGAN a cat can be translate into a dog, The first row shows the input image while the one below shows the translation results.In addition, also as shown in Fig. 7 (right), only changing the datasets simply and then directly applying the CycleGAN model, we can translate a male person to female one or female to male (using datasets CelebA, and uniformly scale to 256 256).Of course, only face changing is considered here, and others, for example, hairstyles and clothes are not considered.
Another example, as show in Fig. 8, Jack Clark, (16) collected the ancient map of Babylon, Jerusalem and London, then translated it via CycleGAN into a modern Google Map & Satellite View.
In addition to simple application, CycleGAN has a wide range of important application in many fields, such as Person Re-Identification, (8), (13) Multi-domain image translation (face change), (10) and so on. (19), (20)

Person Re-Identification(Re-ID)
As a sub-problem of image retrieval, because of its important applications in security and surveillance, Person Re-ID has drawn lots of attention from both academia and industry. (13)lthough the development of deep learning and the emergence of many datasets, Person Re-ID performance has been significantly boosted.For example, the Rank-1 accuracy of single query on Market1501 (21) has been improved from 43.8% (22) to 89.9%. (23)The Rank-1 accuracy on CUHK03 (20) labeled dataset has been improved from 19.9% (20) to 88.5%. (30)Different from CycleGAN, (1) PTGAN considers extra constraints on the person foregrounds to ensure the stability of their identities during transferring.
As shown some sample results generated by PTGAN and CycleGAN in Fig. 9. Compared with CycleGAN, PTGAN generates person images with substantially higher quality, and person identities are kept and the styles are effectively transformed.Extensive experimental results on several datasets show that PTGAN can effectively reduce the domain gap among datasets. (13) addition to the PTGAN, the other challenge of the applications of person Re-ID is camera variations, because of being a cross-camera retrieval task, person Re-Indentation suffers from image style variations caused by different cameras.As described above by PTGAN, using the image translate principle of CycleGAN, labeled training images can be style-transferred to each camera, and along with the original training samples form the augmented training set.However, after analysis and experimentation, it can be found that this method also needs to be improved.In order to solve the problem of over-fitting, increasing the data sample also makes a considerable level of noise.
To address this problem, Zhun Zhong et al. (8) proposed a camera style (CamStyle) adaptation method.It employed Cy-cleGAN to generate new training samples: the styles between different cameras are considered as different domains.Given a Re-ID dataset containing images collected from L different camera views, this method is to learn image-to-image translation models for each camera pair with CycleGAN.To encourage the style transfer to preserve the color consistency between the input and the output, they add the identity mapping loss (1) in the CycleGAN loss function (Eq. 1) to enforce the generator to approximate an identity mapping when using Specifically, following the training strategy in, (1) for training images, resize all images to 256×256, and use CycleGAN to train camera-aware style transfer models for each pair of cameras.

Multi-domain image translation
For the study of image to-image translation, the CycleGAN model canceled the restriction of paired data, and used two image datasets with different styles to train, completing the image style translation for two domains.However, existing approaches have limited scalability and robustness in handling more than two domains, since different models should be built independently for every pair of image domains.
That is to say, if we want to train multiple domains from different datasets, each domain needs to retrain the model.For example, if there are four styles of images, according to the CycleGAN model, their inefficiency results from the fact that in order to learn all mappings among 4 domains, 4 × (4-1)=12 generators have to be trained.As shown in Fig. 10, it illustrates how twelve distinct generator networks have to be trained to translate images among four different domains, and the calculation cost is very high.
To address this limitation, Yunjey Choi et al. (10) proposed StarGAN in 2018, a novel and scalable approach that can Different from CycleGAN, (1) PTGAN considers extra constraints on the person foregrounds to ensure the stability of their identities during transferring.
As shown some sample results generated by PTGAN and CycleGAN in Fig. 9.Compared with CycleGAN, PT-GAN generates person images with substantially higher quality, and person identities are kept and the styles are effectively transformed.Extensive experimental results on several datasets show that PTGAN can effectively reduce the domain gap among datasets. (13)n addition to the PTGAN, the other challenge of the applications of person Re-ID is camera variations, because of being a cross-camera retrieval task, person Re-Indentation suffers from image style variations caused by different cameras.As described above by PTGAN, using the image translate principle of CycleGAN, labeled training images can be style-transferred to each camera, and along with the original training samples form the augmented training set.However, after analysis and experimentation, it can be found that this method also needs to be improved.In order to solve the problem of over-fitting, increasing the data sample also makes a considerable level of noise.
To address this problem, Zhun Zhong et al [8] proposed a camera style (CamStyle) adaptation method.It em- '&$ Discriminator Fig. 10: Multi-domain Model ployed CycleGAN to generate new training samples: the styles between different cameras are considered as different domains.Given a re-ID dataset containing images collected from L different camera views, this method is to learn image-to-image translation models for each camera pair with CycleGAN.To encourage the style transfer to preserve the color consistency between the input and the output, they add the identity mapping loss (1) in the CycleGAN loss function (Eq.1) to enforce the generator to approximate an identity mapping when using real images of the target domain as input.
Specifically,following the training strategy in [1],for training images,resize all images to 256 256,and use CycleGAN to train camera-aware style transfer models for each pair of cameras.

Multi-domain image translation
For the study of image to-image translation, the Cycle-GAN model canceled the restriction of paired data, and used two image datasets with different styles to train, completing the image style translation for two domains.However, existing approaches have limited scalability and robustness in handling more than two domains, since different models should be built independently for every pair of image domains.
That is to say, if we want to train multiple domains from different datasets, each domain needs to retrain the model.For example, if there are four styles of images, according to the CycleGAN model, their inefficiency results from the fact that in order to learn all mappings among 4 domains, 4 * (4 − 1) = 12 generators have to be trained.As shown in Fig. 10, it illustrates how twelve distinct generator networks have to be trained to translate images among four different domains, and the calculation cost is very high.They empirically demonstrate the effectiveness of their approach on a facial attribute transfer and a facial expression synthesis tasks. (10) shown in Fig. 11, the StarGAN model takes in training data of multiple domains, and learns the mappings between all available domains using only one generator and discriminator.As far as our knowledge goes, this work is the first to successfully perform multi-domain image translation across different datasets.(10) The training process of StarGAN is very similar to Cy-cleGAN.The original implementation of StarGAN was the pytorch version.(26) Fig. 12 shows the test results by using the simple Tensorflow version.( 24), (27) The image I wanted (the first three rows) and the CelebA test images the last two rows run simultaneously.Also as shown in Fig. 13, (10) the first column shows the input images, next three columns show the single attribute transfer results, and rightmost columns show the multi-attribute transfer results.(H: Hair color, G: Gender, A: Aged), which is the facial attribute translation results on the CelebA dataset.It can be found that StarGAN generated images of higher visual quality compared to CycleGAN model.By utilizing a mask vector method that enables StarGAN to control all available domain labels, we can further extend to train multiple do-

Discussion and Future Work
Although CycleGAN model have shown remarkable success in image-to-image translation for two domains, it can be found that in addition to the mentioned multi-domain image translation above, there are still some aspects that need to be improved.
First of all, the accuracy of image translation needs to be improved.For example, once a person appears in the input image, CycleGAN will also turn him into a zebra or horse, as shown in Fig. 15, in reality, which is unreasonable.As a solution to such problems, Mask R-CNN (19) has been proposed.It makes an image segmentation first and then translate it into, but this effect is not very satisfactory, because once there are overlaps between people and zebras or horses on the image, for example, if a person rides on a horse, it will be difficult to divide.At the same time, we find that if the translation process To address this limitation, Yunjey Choi et al [10] proposed StarGAN in 2018, a novel and scalable approach that can perform image-to-image translations for multiple domains using only a single model.Such a unified model architecture of StarGAN allows simultaneous training of multiple datasets with different domains within a single network.This leads to StarGAN's superior quality of translated images compared with existing models as well as the novel capability of flexibly translating an input image to any desired target domain.They empirically demonstrate the effectiveness of their approach on a facial attribute transfer and a facial expression synthesis tasks. (10)s shown in Fig. 11, the StarGAN model takes in training data of multiple domains, and learns the mappings between all available domains using only one generator and discriminator.As far as our knowledge goes, this work is the first to successfully perform multi-domain image translation across different datasets.(10) The training process of StarGAN is very similar to Cy-cleGAN.The original implementation of StarGAN was the pytorch version.(26) Fig. 12 shows the test results by Also as shown in Fig. 13.(10) The first column shows the input images, next three columns show the single attribute transfer results, and rightmost columns show the multi-attribute transfer results.(H: Hair color, G: Gender, A: Aged), which is the facial attribute translation results on the CelebA dataset.It can be found that StarGAN generated images of higher visual quality compared to CycleGAN model.By utilizing a mask vector method that enables StarGAN to control all available domain labels, we can further extend to train multiple domains from different datasets, such as jointly training CelebA and RaFD images to change a CelebA image s facial expression using features learned by training on RaFD, as shown in Fig. 14. (10) Except as described above the CycleGAN model application of Person re-identification and face change, there are other applications in other fields ,such as, Xiaodan Liang et al [20] proposed the Contrasting GAN game graphics model, and so on.requires geometric changes, the translation result is also not ideal, as shown in Fig. 7, on the task of translate dog and cat transfiguration.Those problems need further study.
Another problem we observe is that, the calculation cost needs to be improved.In the original program, they set epoch=200, epoch_step=100, and batch_size=1 separately.Instead of running on GPU, the author has trained it on CPU.It takes about six hours for one epoch, and the training time spent in the CPU is too long.Apart from those, there is an important problem that must be paid attention to in image processing field.After analyzing CycleGAN, only 256p/512p low resolution images can be output.For this problem, the pix2pixHD model is well solved. (25)Pix2pixHD takes a pyramidal approach: first output low resolution images.The previous output low resolution image is used as input to another network, and then a higher resolution image is generated.

Conclusions
The task of image-to-image translation is to change a particular aspect of a given image to another, e.g., it includes translation for two domains and multiple domains.In this paper, we mainly give a brief review on CycleGAN Generative Adversarial Networks, including its application background, basic principles, comparison with GAN, pix2pix, and other unpaired image-to-image translation model, etc.Finally, we give an analysis of its application in the field of computer vision.Our future work will continue to study more effective and efficient image-to-image transfer strategies, especially in the field of Person Re-Identification. it can be found that in addition to the mentioned multidomain image translation above, there are still some aspects that need to be improved.First of all, the accuracy of image translation needs to be improved.For example, once a person appears in the input image, CycleGAN will also turn him into a zebra or horse, as shown in Fig. 15, in reality, which is unreasonable.As a solution to such problems, Mask R-CNN (19) has been proposed.It makes an image segmentation first and then translate it into, but this effect is not very satisfactory, because once there are overlaps between people and zebras or horses on the image, for example, if a person rides on a horse, it will be difficult to divide.At the same time, we find that if the translation process requires geometric changes, the translation result is also not ideal, as shown in Fig. 7, on the task of translate dog and cat transfiguration.Those problems need further study.
Another problem we observe is that, the calculation cost needs to be improved.In the original program, they set epoch=200, epoch step=100,and batch size=1 separately.Instead of running on GPU, the author has trained it on CPU.It takes about six hours for one epoch, and the training time spent in the CPU is too long.Apart from those, there is an important problem that must be paid attention to in image processing field.After analyzing CycleGAN, only 256p/512p low resolution images can be output.For this problem, the pix2pixHD model is well solved. (25)Pix2pixHD takes a pyramidal approach: first output low resolution images.The previous output low resolution image is used as input to another network, and then a higher resolution image is generated.

Conclusions
The task of image-to-image translation is to change a particular aspect of a given image to another, e.g., it includes translation for two domains and multiple do-mains.In this paper, we mainly give a brief review on CycleGAN Generative Adversarial Networks, including its application background, basic principles, comparison with GAN, pix2pix, and other unpaired image-to-image translation model, etc.Finally, we give an analysis of its application in the field of computer vision.Our future work will continue to study more effective and efficient image-to-image transfer strategies, especially in the field of Person Re-Identification.

Fig. 8 .
Fig.8 .Map translation.Fig. 8. Map translation in 2018, which is inspired by the CycleGAN.And PTGAN model is propose to bridge the domain gap by transferring persons in dataset A to another dataset B. The transferred persons from A, are desired to keep their identities, meanwhile present similar styles, e.g., backgrounds,