Training Breaks in Pytorch > 1.5.0 throws inplace modification error

So, I was following this Hidden Markov Model Tutorial. But the limitation it has is that it breaks if PyTorch is > 1.5.0 it throws: https://colab.research.google.com/drive/1IUe9lfoIiQsL49atSOgxnCmMR_zJazKI#scrollTo=3CMdK1EfE1SJ

While training the forward algorithm

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [64, 1024]], which is output 0 of TransposeBackward0, is at version 23; expected version 22 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

Setting set_detect_anomaly doesn’t give any other output, but the thing is transpose is not used in any funky way where any in-place operation is done? is it?
Plus, it is working without any error, when Pytorch is downgraded to 1.5.0.

Setting set_detect_anomaly doesn’t give any other output

Make sure to use latest pytorch as we recently fix warnings not showing up in colab.
Or run your code in command line to have the corresponding forward code.

The code does quite a lot of inplace and viewing ops.

Plus, it is working without any error, when Pytorch is downgraded to 1.5.0.

This kind of check is here to make sure we don’t compute silently wrong gradients. So it is most likely that the old behavior was silently computing wrong gradients and has been fixed in more recent versions.

1 Like

Aha! It was absolutely amazing! The way my output changed when I turned on anomaly detection on my own terminal! Thank you so much for the help, it got fixed :slight_smile: Apparently, the torch.stack was causing the problem, I changed it to

log_A_expanded = log_a.unsqueeze(2).expand((m, n, p))
log_B_expanded = log_b.unsqueeze(0).expand((m, n, p))

even though it basically works the same way but is not computing :slight_smile:
Though I am a little confused about why it might have failed in

	log_A_expanded = torch.stack([log_A] * p, dim=2)
	log_B_expanded = torch.stack([log_B] * m, dim=0)

Do you have any insight?

The expand version will be much more efficient for sure as it does not actually allocates new memory and does not do any copy.

They do behave differently in subtle ways. But not sure the exact reason here :confused:

1 Like