Could anyone tell me how the “torch.roll” works in backward process? I was trying to check how gradients changes in backward process, but it seems that there is no gradient information for the layer of “torch.roll”.
The backward operation of roll
would just undo the roll
operation on the tensor:
x = torch.randn(10, requires_grad=True)
y = x * torch.arange(10)
y.mean().backward()
print(x.grad)
# > tensor([0.0000, 0.1000, 0.2000, 0.3000, 0.4000, 0.5000, 0.6000, 0.7000, 0.8000, 0.9000])
x.grad = None
y = x.roll(1) * torch.arange(10)
y.mean().backward()
print(x.grad)
# > tensor([0.1000, 0.2000, 0.3000, 0.4000, 0.5000, 0.6000, 0.7000, 0.8000, 0.9000, 0.0000])
2 Likes