Any suggestion on speeding up training with mini-batch size of 1?

xuhdev · September 17, 2018, 6:51am

When the mini-batch size is 1, it’s often the case that building the model, calling outputs.backward() and optimizer.step() themselves are more time consuming than the actual gradient computation. Do you have any suggestions? I know the coming JIT support can potentially resolve the model building issue, but the other two steps are still significant…

albanD · September 17, 2018, 9:15am

Hi,

The jit will help both for model building and backward pass.
Unfortunately I don’t know of any way to speed up the optimizer.step() further.

xuhdev · September 17, 2018, 6:54pm

Thanks! If backward() is also supported, then I think the doc has put this wrong: It does not say it supports torch Variable type.

albanD · September 18, 2018, 9:18am

Hi,

Tensors and Variables have been merged a while ago now. So it supports Tensors, both the ones that requires_grad and the ones that don’t.