Gradient calculation

VladislavPrh · March 17, 2017, 2:54pm

Hi!
When I call, for example, print(net.conv1.bias.grad), I get vector of bias gradients, but we use batches, is my understanding is correct that we sum up the gradients in the batch and then divide the resulting vector on batch_size and it will be the result of print(net.conv1.bias.grad)?
Thanks!

smth · March 17, 2017, 4:09pm

yes, your understanding is correct. Whether you divide the resulting vector by batch_size or not depends on the size_average=True option present in most loss functions. By default, you do divide by the batch size.

VladislavPrh · March 17, 2017, 4:23pm

@smth Thank you for clarification!