Not that loss functions average over the batch size, so if you do multiple backprops you might need to average over the number of the for loop iterations.
Not that loss functions average over the batch size, so if you do multiple backprops you might need to average over the number of the for loop iterations.