When I execute the following code, there is no error:
A = torch.eye(3, 3, requires_grad=True)
y = torch.randn(5, 2, 3, 3)
y[:, 1, :, :] = y[:, 0, :, :] @ A
y.sum().backward()
But if you make a slight modification: change the size of the first dimension from 5 to 1, an error occurs.
A = torch.eye(3, 3, requires_grad=True)
y = torch.randn(1, 2, 3, 3)
y[:, 1, :, :] = y[:, 0, :, :] @ A
y.mean().backward()
The error is:
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [3, 3]], which is output 0 of AsStridedBackward0, is at version 1; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
My current version is ‘2.0.0+cu118’; I also tested ‘1.13.0+cu117’ the error still exists. (I didn’t test other versions, so not sure about other versions)
It seems that when the first dimension is 1, the in-place operation will be automatically performed when the slice is assigned, which will cause the calculation graph to be destroyed.
A simple solution is to manually clone()
, as follows:
A = torch.eye(3, 3, requires_grad=True)
y = torch.randn(1, 2, 3, 3)
y[:, 1, :, :] = y[:, 0, :, :].clone() @ A
y.mean().backward()
I’m wondering, are there any more details about Slice assignment mentioned in the documentation?
Is this situation I encountered intentional for performance optimization, or is it a BUG?