Train large models in GPUs

surya00060 · October 30, 2019, 7:20pm

I want to train VGG and ResNet from scratch. These models are too big to fit in a single GPU. Always get CUDA out of memory for batches sizes of 128 or 256. Is there any way I could train these models? Should I use multiple GPUs to train with each GPU processing small batch size?

Thanks in advance for any solutions.

ptrblck · October 30, 2019, 8:01pm

If you have multiple GPUs, you could use e.g. DistributedDataParallel to chunk the batch so that each model (and device) will process a smaller batch size.

Alternatively, you could lower your batch size or use torch.utils.checkpoint to trade compute for memory.

Eta_C · November 1, 2019, 3:19am

[Recommend] For PyTorch >= 0.4, you can use torch.utils.checkpoint. It will compute the model piece by piece.
[Impaired performance] You can also lower your batch size, multiple forward, one backward. If you use multiple GPUs, maybe NVIDIA Apex SyncBatchNorm could help you to correcte the bad consequences.
[Impaired performance] Training a mixed precision model is also a way to solve your problem.

Oscar_Rangel · May 11, 2020, 12:10pm

@ptrblck, @Eta_C can you elaborate mode on checkpoint and if you have an example how to code it ?

Thanks.

ptrblck · May 12, 2020, 3:02am

Here is a notebook showing an example usage (which is quite old by now, but should still show the proper usage of it).