Hi guys, please see the following image.
I don’t get why the third element of a.grad is -0.333. Could anyone fill me in?
Hi guys, please see the following image.
The gradients you are seeing is accumulated from the division and the max operation.
Here is the step by step approach:
a = torch.tensor([1., 2., 3.], requires_grad=True)
# Get div gradients
y = a / a.max().detach()
y.backward(torch.ones_like(y))
print('div grad: ', a.grad)
a.grad.zero_()
# Get max gradients
y = a.detach() / a.max()
y.backward(torch.ones_like(y))
print('max grad: ', a.grad)
a.grad.zero_()
# Get both
y = a / a.max()
y.backward(torch.ones_like(y))
print('both: ', a.grad)
div grad: tensor([0.3333, 0.3333, 0.3333])
max grad: tensor([ 0.0000, 0.0000, -0.6667])
both: tensor([ 0.3333, 0.3333, -0.3333])
Much appreciated. I get it now.