Zeroing out gradient in word embeddings tutorial

I noticed that in the Word Embeddings tutorial, the N-Gram Language Modelling example zeroes out the gradient with the following line of code:

# Step 2. Recall that torch *accumulates* gradients. Before passing in a
# new instance, you need to zero out the gradients from the old
# instance
model.zero_grad()

My understanding is that this line should be optimizer.zero_grad(). Why is this case different?

Thanks

optimizer.zero_grad() is equal to model.zero_grad() if all parameters were passed to the optimizer, which is also the case in this tutorial.

1 Like