Problem: one of the variables needed for gradient computation has been modified by an inplace operation

In the posted code snippet you are using loss.backward(retain_graph=True). Could you explain why this is necessary for your use case, as it’s often applied as a workaround for other issues and could yield the disallowed inplace modification instead.