Does Tensor.backward() updates weights accumulatively?


I am training my model on a large dataset, which also requires large batch size. Considering limited GPU memory, I plan to split each mini-batch (1000 images) into 10 ‘micro-batch’ (100 images). For each ‘micro-batch’, I call loss_micro-batch.backward() to calculate the gradients for weights, and call optimizer.step() after processing these 10 ‘micro-batch’.

My question is: Is this a valid approach to overcome GPU memory limits? Every time I call loss_micro-batch.backward(), gradients get accumulated to weights? Or I have to sum the micro-batch loss to be mini-batch loss and call sum_loss_micro-batch.backward()? I am worrying my approach would make the actual mini-batch size to be as the same as my ‘micro-batch’ size.


Version: Python: 3.6 + PyTorch 0.4


Yes it will accumulate the gradients. To reset them to 0 after doing your update, you can use optimizer.zero_grad().

1 Like

I just didn’t find a decent way to verify that. Thank you!