RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [200]] is at version 2; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to comput

KFrank · May 14, 2023, 4:52pm

Hi Mahmoud!

For some suggestions about how to debug such inplace-modification errors,
see this post:

"RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [64, 1]], which is output 0 of AsStridedBackward0, is at version 3; expected version 2 instead. Hint: the backtrace further a autograd

Hi Fahmyadan and Sangyoon! Here are some suggestions about how to track down (and maybe fix) inplace-modification errors. Note that an inplace modification in the forward pass is not necessarily* an error – it depends on whether and how the tensor that was modified is used in the backward pass. Note that inplace operations can be useful for saving memory – if you replace an innocent inplace operation with an out-of-place equivalent, your training will use more memory (and, to a minor e…

                self.AE_opt1.zero_grad()
                AE_loss1 = self.AE_criterion(o1, x1) 
                AE_loss1.backward(retain_graph=True)
                self.AE_opt1.step()
                
                self.AE_opt2.zero_grad()
                AE_loss2 = self.AE_criterion(o2, x2)
                AE_loss2.backward(retain_graph=True)
                self.AE_opt2.step()
                
                self.AE_opt3.zero_grad()
                AE_loss3 = self.AE_criterion(o3, x3)
                AE_loss3.backward(retain_graph=True)
                self.AE_opt3.step()

As discussed in the linked post, these backward (retain_graph=True) calls
often cause inplace-modification errors, so you should make a particular point
of looking at them while debugging your issue.

Best.

K. Frank