There are two discriminators and two generators for the CycleGan. Now, they also provide their neural networks code. I am particularly interested in the generator(s), which they implement here:
So they use a ResnetGenerator, but I’m afraid I do not really understand it yet (cf. lines 119-159 and 315-373).
For the generator, why do we have both downsampling (Conv2d) and upsampling (ConvTranpose2d) layers? I generally know it like this for the generator that the generator only uses ConvTranpose2d layers, where the input is noise sampled from a uniform or Gaussian distribution…
Based on Section7.1 from the paper the authors are reusing the image transformation network from Perceptual Losses for Real-Time Style Transfer and Super-Resolution, which uses this bottleneck architecture. I can’t find more details about this choice so I would assume that this model architecture worked fine for their implementation.