lets say I have a network known as net1.
The optimizer for that network is defined as follows:
optimizer = torch.optim.Adam(net1.parameters(),lr=opt.lr, betas=(opt.beta1, 0.999))
I generally do optimizer.zero_grad() before backpropagating into net1. In a sample code relating to the same problem , I saw someone do net1.zero_grad(). What is the difference between optimizerD.zero_grad() and net1.zero_grad() ??