Understanding the ResnetGenerator in the CycleGan Model

In their famous research paper on CycleGans (https://arxiv.org/pdf/1703.10593v7.pdf), the authors implement - well, a CycleGan.

There are two discriminators and two generators for the CycleGan. Now, they also provide their neural networks code. I am particularly interested in the generator(s), which they implement here:

So they use a ResnetGenerator, but I’m afraid I do not really understand it yet (cf. lines 119-159 and 315-373).

For the generator, why do we have both downsampling (Conv2d) and upsampling (ConvTranpose2d) layers? I generally know it like this for the generator that the generator only uses ConvTranpose2d layers, where the input is noise sampled from a uniform or Gaussian distribution…

That’s why I am confused…

Based on Section7.1 from the paper the authors are reusing the image transformation network from Perceptual Losses for Real-Time Style Transfer and Super-Resolution, which uses this bottleneck architecture. I can’t find more details about this choice so I would assume that this model architecture worked fine for their implementation.