What the different between net.zero_grad() and optimiser.zero_grad()

isaac1644 · February 6, 2020, 10:41am

I’m sorry for asking this question. But I really don’t know the result .

tux · February 6, 2020, 11:47am

They do the same thing, the only difference is that one of them is through the optim module while the other is through the nn module. Let’s have a look at the doc.

From the optim doc https://pytorch.org/docs/stable/_modules/torch/optim/optimizer.html, you have :

def zero_grad(self):
        r"""Clears the gradients of all optimized :class:`torch.Tensor` s."""
        for group in self.param_groups:
            for p in group['params']:
                if p.grad is not None:
                    p.grad.detach_()
                    p.grad.zero_()

while from the nn doc (https://pytorch.org/docs/stable/_modules/torch/nn/modules/module.html#Module.zero_grad), you have :

def zero_grad(self):
        r"""Sets gradients of all model parameters to zero."""
        for p in self.parameters():
            if p.grad is not None:
                p.grad.detach_()
                p.grad.zero_()

The only difference is for group in self.param_groups which represents the neural network parameter when you initialize your optimizer.