DCGAN tutorial generator backprop confusion

Hello, i’m learning pytorch and confused about how the loss works in the DCGAN tutorial -

the discriminator has 2 outputs (real or fake) so i understand how the loss is calculated here and the backprop of this information

the generator produces an output of size 3x64x64 but also effectively gets a loss based on how many times it fooled the discriminator. I don’t understand how this single loss value number is back propped into the 3x64x64 output. i’ve clearly missed something important !!

any guidance would be greatly appreciated. thanks

So, training a GAN model is broken into two steps:

  1. Updating the Discriminator (while the generator is frozen)
  2. Updating the generator (while the discriminator is frozen)

These two steps alternate at each step, and during the update-step for the discriminator, the generator is frozen, and similarly, during the update-step for the generator, the discriminator, the generator is frozen.

Since your question is about the generator, for computing the loss for training the generator, we feed the input (can be a latent vector, or sometimes an input image) to the generator, and we get output x_syn.

For example, if G is our generator, and it’s input is latent vector z, then x_syn = G(z). Now, we feed the synthesized image (or also called fake image) x_syn to the discriminator to get output = D(x_syn). But here D is frozen so that we do not change the gradients of D. Then, we compute the loss associated with this fake (synthesized) image: g_loss = criterion(output, labels), where labels are the labels for real images. Since, we expect the generator to generate real-looking images.

Now, calling .backward on g_loss, will calculate the gradients of the generator network G, since for computing g_loss, the discriminator was frozen (as mentioned above). And these gradients will update the generator network.

Note that even if you do not freeze the discriminator, since only the parameters of the generator are passed to the optimizer optim_g, calling optim_g.step() will not affect the parameters of the discriminator. However, it is more efficient to freeze the discriminator nework when we intend to update the generator.

thanks, that’s a bit clearer.

i think i need to understand exactly what a gradient is and how it’s applied. what is the dimension of a gradient ?

The gradients have the same dimensions as the parameters. So basically, for each parameter in the network, there is one value which determines how to update that corresponding parameter.

right. got it. thanks for all your help :slight_smile: