Backprop through generator to update latent

Hi!

I have a GAN-generator setup where I want to compute the loss of a generated image w.r.t. the true image and then backpropagate that loss to update the input vector. The relevant code would look something like this:

for epoch in range(100):
    for img, latent in dataset:
        gen_img = generator(latent)
        loss = loss_func(true_img, gen_img)
        latent.grad.zero_()
        loss.backward()
        latent.data.add(latent.grad.data, alpha= -learning_rate)

My query is: does it matter if I zero out the gradients of the generator or not? My guess is: gradients are accumulated, so at every iteration, these accumulated grads in the generator are used in the chain rule to compute the derivative w.r.t. latent. So the latent will get updated incorrectly.

Please correct me if this guess is incorrect.

Thanks!

I don’t think that’s the case since the gradients won’t be included in the next gradient calculation unless you manually add them to the computation graph. To verify it, you could compare a run with and without gradient accumulation using the same inputs and making sure the model is in eval() mode to disable potential non-deterministic output via e.g. dropout layers.

@ptrblck Thanks for the response! So if I understand correctly, gradients of the computation graph of generator are cleared before every iteration automatically?

I also came across this (link):

PyTorch uses a dynamic graph. That means that the computational graph is built up dynamically, immediately after we declare variables. This graph is thus rebuilt after each iteration of training.

Is this the reason? I am planning to run the 2 experiments soon, but just for sake of understanding, it would be good to know how exactly the computation graphs and gradients work :slight_smile:

That isn’t the case, in my opinion, as the gradients won’t be accounted for in the subsequent gradient calculation unless you specifically include them in the computation graph. By comparing runs with and without gradient accumulation using the same inputs and ensuring sure the model is in eval() mode to prevent potential non-deterministic output, such as dropout layers, you may confirm it.

At my perspective, that isn’t situation because, unless you specifically include them in the computation graph, the gradients won’t be taken into consideration in the future gradient meta tags generator. You may verify it by comparing runs with and without gradient accumulation using the same inputs, making sure the model is in eval() mode to avoid potential ou pas output, such as dropout layers, and confirming that the gradient collection was successful.