Any suggestion on speeding up training with mini-batch size of 1?

When the mini-batch size is 1, it’s often the case that building the model, calling outputs.backward() and optimizer.step() themselves are more time consuming than the actual gradient computation. Do you have any suggestions? I know the coming JIT support can potentially resolve the model building issue, but the other two steps are still significant…


The jit will help both for model building and backward pass.
Unfortunately I don’t know of any way to speed up the optimizer.step() further.

Thanks! If backward() is also supported, then I think the doc has put this wrong: It does not say it supports torch Variable type.


Tensors and Variables have been merged a while ago now. So it supports Tensors, both the ones that requires_grad and the ones that don’t.