Pytorch load checkpoints to continue training - zero_grad() or not?

I have saved and loaded the checkpoints as per the pytorch manual, and it all seems OK. Now, usually, when I want to start training, I have something like this in pytorch:

    for itr in range(1, args.niters + 1):
        optimizer.zero_grad() # should I or should I not?

I am unsure if I should do zero_grad() here (which I use when I start training from scratch), since I am reloading all my weights and bias.

Apologies if this is a daft question.


Yes, you should still use optimizer.zero_grad() since the only thing it does is clearing the gradients. When you load a checkpoint, you load the weights of your model (parameters), you do not load any gradients. So if you keep training from this checkpoint, you still want to reset your gradients after each step.

And I don’t believe there are any daft questions (in fact, I just learned a new word :upside_down_face:)!