The generator and the encoder are updated together and the discriminator is updated separately.

The following is done for each batch of images

generator.train()
encoder.train()
generator.zero_grad()
encoder.zero_grad()
discriminator.zero_grad()
optimizer_ge.zero_grad()
fake_image = generator(random_z)
fake_op = discriminator(fake_image)
real_op = discriminator(real_image)
zn, zc, zc_idx = encoder(fake_image)
ge_loss = (Cross_entropy loss) + (Clustering_loss)
ge_loss.backward(retain_graph=True)
optimizer_ge.step()
opt_disc.zero_grad()
# Compute vannila gan discriminator loss disc_loss using bce loss function
disc_loss.backward()
opt_disc.step()

The above code works fine in torch 1.0 but torch 1.7 throws the following error.

one of the variables needed for gradient computation has been modified by an inplace operation:
[torch.cuda.FloatTensor [64, 1, 4, 4]] is at version 2; expected version 1 instead.
Hint: enable anomaly detection to find the operation that failed to
compute its gradient, with torch.autograd.set_detect_anomaly(True).

Hi @ptrblck. Thanks for the reply. I took a look at the thread.
I actually want to implement the way you suggested in that thread but currently failing to do that.

If you call loss2.backward after opt1.step(), the parameters used to calculate loss2 were already updated and thus loss2 would be stale.
The proper way would be to execute a new forward pass to compute loss2 and call loss2.backward() afterwards.

Yes, the inplace updates of parameters are raising an error now, if you are using stale gradients as described in the 1.5 release notes (described in the torch.optim optimizers changed to fix in-place checks for the changes made by the optimizer section).

The reason is that the gradient computation would be incorrect. In your example you would calculate loss1 and loss2 using the model parameters in the initial state s0. loss1.backward() calculates the gradients and opt1.step() updates the parameters to state s1. loss2.backward() was computed using the model in state s0 and would thus calculate the gradients of loss2 w.r.t. parameters s0, while the model is already updated to s1. These gradients would thus be wrong and the error is raised.