Zero grad optimizer or net?

What should we use to clear out the gradients accumulated for the parameters of the network?

 optimizer.zero_grad()
 net.zero_grad()

I have seen tutorials use them interchangeably. Are they the same or different? If different, what is the difference and do you need to execute both?

22 Likes

if optimizer = optim.Optimizer(net.parameters()),they are the same.

    def zero_grad(self):
        """Sets gradients of all model parameters to zero."""
        for p in self.parameters():
            if p.grad is not None:
                p.grad.data.zero_()
26 Likes

We’re supposed to clear the gradients each iteration before calling loss.backward() and optimizer.step(), correct?

3 Likes

Yes, that’s is the suggested usage

3 Likes

Hi,
I am trying an example adapted from here:![nn02|478x500]

I copy pasted the example and this exception is thrown:

Has the API changed?
Thanks,

These days, if there isn’t a gradient (yet), .grad can be None.
When I had that while trying some code, I fixed it with if w1.grad is not None: .

Best regards

Thomas

2 Likes

Hi,
I’m trying to use an optimizer only on a part of the module parameters. Should I call module.zero_grad() instead of optimizer.zero_grad() if there are other layers between the loss and layer that i’m training?

And what if I need to train only the last layer(right before the loss) of a module? The gradients for the previous layers wouldn’t be computed at all. So there is no difference between calling module.zero_grad() and calling optimizer.zero_grads(), right?

4 Likes

any solution regarding this question?

I also think there is no difference between model.zero_grad() and model_optimizer.zero_grad()

As @chenyuntc explained, if you pass all parameters of your model to the optimizer, both calls will be equal.
However, there might be use cases where you would like to use different optimizers for different parts of your model. In such a case, model.zero_grad() would clear all parameters of the model, while the optimizerX.zero_grad() call will just clean the gradients of the parameters that were passed to it.

25 Likes

But if when we added a extra layer like this

classifier = nn.Linear(128, 200,bias=False)
nn.init.xavier_uniform_(classifier.weight)
model.add_module(“classify”,classifier)

but we do not want to update the parameter of this layer. So in this case what we should do?

Something like

for p in classifier.parameters():
    p.requires_grad_(False)

before training should do the trick. Also filter the parameter from what you pass to the optimizer.

Best regards

Thomas

1 Like

For example ,when we create an optimizer like this:
optimizer_ft=optim.SGD(model_ft.parameters(),lr=0.001,momentum=0.9)
we will add the model’s parameters to this optimizer,so when we call the function optimizer.zero_grad() it will update these parameters ,but if there is another model which did’t register this optimizer, optimizer.zero_grad() will not update it’s grad

@chenyuntc Can u please elaborate your answer?

Thanks in advance…