PyTorch Gradients

apaszke · March 5, 2017, 10:25pm

Not that loss functions average over the batch size, so if you do multiple backprops you might need to average over the number of the for loop iterations.