For example when you retrieve the gradients like so:
loss = F.nll_loss(output, target)
loss.backward()
for key, value in model.named_parameters():
mygrad = value.grad
is mygrad the sum of gradients over the minibatch or the average?
For example when you retrieve the gradients like so:
loss = F.nll_loss(output, target)
loss.backward()
for key, value in model.named_parameters():
mygrad = value.grad
is mygrad the sum of gradients over the minibatch or the average?
Well, it depends on the loss function you use right? All autograd does is just to calculate the gradient, it has no notion of batching, and I don’t see how it can have different behavior with different batching mechanism. If you look at the doc of the loss function here (http://pytorch.org/docs/master/nn.html#torch.nn.functional.nll_loss), you will notice that size_average=True
in your case. It means that the loss is averaged across the batch.