Why do I use net.zero_grad() and then go backward on loss. The gradient before and after execution is the same

Why do I use net.zero_grad() and then go backward on loss. The gradient before and after execution is the same

print(out)
target = Variable(t.arange(0,10)).float()
criterion = nn.MSELoss()
loss = criterion(out, target)
print(loss)
#梯度清0
net.zero_grad()
print("反向传播之前的conv1.bias的梯度")
print(net.conv1.bias.backward)
loss.backward(retain_graph=True)
print("反向传播之后的梯度")
print(net.conv1.bias.backward)

反向传播之前的conv1.bias的梯度
<bound method Tensor.backward of Parameter containing:
tensor([ 0.0900, 0.1849, -0.1985, -0.0836, -0.1590, 0.0689],
requires_grad=True)>
反向传播之后的梯度
<bound method Tensor.backward of Parameter containing:
tensor([ 0.0900, 0.1849, -0.1985, -0.0836, -0.1590, 0.0689],
requires_grad=True)>

  1. net.zero_grad() zeroes out grad attribute of every parameter in the net.
  2. loss.backward() computes gradients of every parameter w.r.t loss tensor.

In order to see the gradient of a parameter, you can do print(net.conv1.bias.grad) instead of print(net.conv1.bias.backward).
backward as you see is a method and it is showing the actual parameters and not the gradients.

The fact that the parameters remain the same before and after backward() call is because you need to do optimizer.step() in order to update the parameters.

1 Like

you are right!
Thanks