I see. Even in this case, the final lossD.backward()
faces this variable modified in-place
scenario.
From @albanD 's answer here:
You can use del lossD
instead of final lossD.backward()
(to release computational graph). Can you try that?
Edit: Can you pack encoder and decoder into one optimizer (or) backward them together, if possible? The Encoder grad calculation is dependent on Decoder parameters as well. So, you can’t optimizer_decoder.step()
before loss_ele.backward()
. One solution is as follows:
calculate encoder loss, decoder loss, discriminator loss
# discriminator update
optimizer_disc.zero_grad()
lossD.backward(retain_graph=True)
optimizer_disc.step()
# encoder and decoder update
optimizer_encoder.zero_grad()
optimizer_decoder.zero_grad()
loss_generator = loss_encoder + loss_decoder
loss_generator.backward()
optimizer_decoder.step()
optimizer_encoder.step()
# to release the computation graph of the discriminator
del lossD