I noticed that in the Word Embeddings tutorial, the N-Gram Language Modelling example zeroes out the gradient with the following line of code:
# Step 2. Recall that torch *accumulates* gradients. Before passing in a # new instance, you need to zero out the gradients from the old # instance model.zero_grad()
My understanding is that this line should be
optimizer.zero_grad(). Why is this case different?