What does zero_grad() do in this case

lets say I have a network known as net1.

The optimizer for that network is defined as follows:

optimizer = torch.optim.Adam(net1.parameters(),lr=opt.lr, betas=(opt.beta1, 0.999))

I generally do optimizer.zero_grad() before backpropagating into net1. In a sample code relating to the same problem , I saw someone do net1.zero_grad(). What is the difference between optimizerD.zero_grad() and net1.zero_grad() ??

If you use all model’s parameters in your optimizer, it’s the same.

optimizer = torch.optim.Adam(net1.parameters()

However, if you just use a subset of the model’s parameters in your optimizer, it will be clearly different.

3 Likes

Thanks @ptrblck for your reply