Options to run a model that uses too much memory for the GPU?

hughperkins · December 11, 2018, 9:58pm

I have a model that will use more memory than is available on the GPU

one option could be to train it on the CPU, but that sounds slow, though not impossible
one option could be to split the model across multiple GPUs, but I’d rather avoid that, for various reasons
an option that occurs to me is something like:
- run part of the forward pass on one gpu
- somehow move the intermediate results of that into main memory, clean them out of gpu memory (except the outputs themselves)
- rinse and repeat
- in the backward pass, calculate the gradients flowing back to these outputs
- move each of the sets of intermediate results onto the gpu, and pass the relevant gradients through them
- doable?
- faster than just running everything on cpu??? (not faster in dev time I imagine…)

Thoughts?

vmirly1 · December 12, 2018, 2:41am

How about using the DataParallel? You can split the training batches across multiple GPUs.

I would recommend first finding the smallest batch size that you can use with only 1GPU. For example, if the smallest batch you can run on a single GPU is N=8, then utilizing 4 GPUs allows you to increase the batch size N=32.

hughperkins · December 12, 2018, 2:53am

Ok. How to use data parallel to split training across a single gpu?

vmirly1 · December 12, 2018, 2:56am

No there is no point of doing that if you only have 1 GPU. In one of your options you were tlaking about multiple GPUs, so I assumed you have access to multiple GPUs.

albanD · December 12, 2018, 10:38am

Have you tried torch.utils.checkpoint? That will trade off a bit of speed for memory

hughperkins · December 12, 2018, 11:02am

Thanks! Will take a look