I am trying to implement the cluster GAN architecture in Pytorch. The following steps work in Pytorch 1.0 but not in torch 1.7.0+cu101
optimizer_ge = Adam(itertools.chain(encoder.parameters(), generator.parameters()) ....)
opt_disc = Adam(discriminator.parameters() .....)
The generator and the encoder are updated together and the discriminator is updated separately.
The following is done for each batch of images
fake_image = generator(random_z)
fake_op = discriminator(fake_image)
real_op = discriminator(real_image)
zn, zc, zc_idx = encoder(fake_image)
ge_loss = (Cross_entropy loss) + (Clustering_loss)
# Compute vannila gan discriminator loss disc_loss using bce loss function
The above code works fine in torch 1.0 but torch 1.7 throws the following error.
one of the variables needed for gradient computation has been modified by an inplace operation:
[torch.cuda.FloatTensor [64, 1, 4, 4]] is at version 2; expected version 1 instead.
Hint: enable anomaly detection to find the operation that failed to
compute its gradient, with torch.autograd.set_detect_anomaly(True).
The error seems to be resolved when I do
fake_op = discriminator(fake_image.detach())
However, the results after doing the above changes aren’t matching up with the results of the code run in torch 1.0
Can someone help me in debugging this?
Could you check, if you might be facing a similar issue as described here?
Hi @ptrblck. Thanks for the reply. I took a look at the thread.
I actually want to implement the way you suggested in that thread but currently failing to do that.
The above fails in torch 1.7 but works in torch 1.0
If you call
opt1.step(), the parameters used to calculate
loss2 were already updated and thus
loss2 would be stale.
The proper way would be to execute a new forward pass to compute
loss2 and call
Does it need to be modified like this?
Yes, this approach should work fine.
Thanks. Just to understand a few things better, Did something underlying with the way autograd works change between torch 1.0 and torch 1.7?
The PyTorch implementation of cluster GAN architecture (torch 1.0) uses the following way to update their networks.
The above doesn’t work in torch 1.7
I am not able to reproduce the results using the following suggested change.
Do you have any insights as to why this could happen?
Yes, the inplace updates of parameters are raising an error now, if you are using stale gradients as described in the 1.5 release notes (described in the
torch.optim optimizers changed to fix in-place checks for the changes made by the optimizer section).
The reason is that the gradient computation would be incorrect. In your example you would calculate
loss2 using the model parameters in the initial state
loss1.backward() calculates the gradients and
opt1.step() updates the parameters to state
loss2.backward() was computed using the model in state
s0 and would thus calculate the gradients of
loss2 w.r.t. parameters
s0, while the model is already updated to
s1. These gradients would thus be wrong and the error is raised.