Optimizer issues

raesol · September 15, 2022, 3:46pm

Does the order of optimizer.zero_grad() and optimizer.step() matter? so in the training part of the code, when i put optimizer.zero_grad() before optimizer.step() i got an accuracy of 52% and 50% for training and testing respectively. however, when i reverse the order of step and zero_grad, the accuracty increased to 99% and 68% for training and testing respectively. if zero_grad is actually zeroing the parameters, then why are we getting an accuracy as big as 50%? thank you in advance

srishti-git1110 · September 15, 2022, 3:56pm

The order definitely matters.

The rule is to zero out the previous gradients before the next backward step to prevent accumulation of gradients, and then take the step to update the parameters.

An order that works fine -

optimizer.zero_grad()
loss.backward()
optimizer.step()

Can you please send your two code versions that produced different results?

ptrblck · September 15, 2022, 8:05pm

zero_grad() is setting the gradients to zero, not the parameters.

If you are working on a 2-class classification and the dataset is balanced, a 50% accuracy would be considered a “random” classifier and the worst case. A lower accuracy shouldn’t be possible since your model would then start to “learn” to predict the opposite labels.