PyTorch Gradients

Not that loss functions average over the batch size, so if you do multiple backprops you might need to average over the number of the for loop iterations.

4 Likes