I’m sorry for asking this question. But I really don’t know the result .

They do the same thing, the only difference is that one of them is through the optim module while the other is through the nn module. Let’s have a look at the doc.

From the optim doc https://pytorch.org/docs/stable/_modules/torch/optim/optimizer.html, you have :

```
def zero_grad(self):
r"""Clears the gradients of all optimized :class:`torch.Tensor` s."""
for group in self.param_groups:
for p in group['params']:
if p.grad is not None:
p.grad.detach_()
p.grad.zero_()
```

while from the nn doc (https://pytorch.org/docs/stable/_modules/torch/nn/modules/module.html#Module.zero_grad), you have :

```
def zero_grad(self):
r"""Sets gradients of all model parameters to zero."""
for p in self.parameters():
if p.grad is not None:
p.grad.detach_()
p.grad.zero_()
```

The only difference is `for group in self.param_groups`

which represents the neural network parameter when you initialize your optimizer.

1 Like