Avoid gradients computing for specific operation?

ZdsAlpha · May 6, 2020, 1:15pm

I have auto-encoder. It has two parts encoder and decoder. Its training process is:
latent_space = encoder(x)
y = decoder(latent_space)
latent_reconstruct = encoder(y)
loss = l1 * F.mse_loss(x, y) + l2 * F.mse_loss(latent_space, latent_reconstruct)
loss.backward()
…
Here I want to use encoder as lens (previously i was using VGG). I don’t want to gradients to flow through encoder when used second time.
To make it simple, it can be achieved by copying whole encoder before applying it for second time. But my encoder part is very large and I can’t afford to copy it for every single iteration.

albanD · May 6, 2020, 2:17pm

Hi,

If you want to ignore a bunch of ops when the gradient is computed, you can wrap them into a no_grad block:

with torch.no_grad():
    latent_reconstruct = encoder(y)

Another way to do this is to explicitly break the graph so that the y used in the second encoding is not linked to the rest of the network:

latent_reconstruct = encoder(y.detach())

Both should work fine. You can choose the one that fits best in your code.