Does PyTorch average or sum gradients over a minibatch?

For example when you retrieve the gradients like so:

loss = F.nll_loss(output, target)
loss.backward()
for key, value in model.named_parameters():
    mygrad = value.grad

is mygrad the sum of gradients over the minibatch or the average?

3 Likes

Well, it depends on the loss function you use right? All autograd does is just to calculate the gradient, it has no notion of batching, and I don’t see how it can have different behavior with different batching mechanism. If you look at the doc of the loss function here (http://pytorch.org/docs/master/nn.html#torch.nn.functional.nll_loss), you will notice that size_average=True in your case. It means that the loss is averaged across the batch.

9 Likes