Gradient calculation

When I call, for example, print(net.conv1.bias.grad), I get vector of bias gradients, but we use batches, is my understanding is correct that we sum up the gradients in the batch and then divide the resulting vector on batch_size and it will be the result of print(net.conv1.bias.grad)?

yes, your understanding is correct. Whether you divide the resulting vector by batch_size or not depends on the size_average=True option present in most loss functions. By default, you do divide by the batch size.


@smth Thank you for clarification!