Using cpu memory as additional memory for GPU

salahelabyad · October 28, 2020, 6:50am

Is there a way in pytorch to borrow memory from the CPU when training on GPU. I am training a model related to video processing and would like to increase the batch size. This is to know if increasing batch size can improve the results of the model by better training it, especially the batchnorm3d part.

I am trying to train a model that requires a lot of memory and my CPU has more memory and can handle a larger batch size, but the GPU is much faster but limitied in memory. So I want to add the memory in the CPU as usable memory for the GPU somehow. To be able to increase batch size (I use batchnorm3d and from what I understand the minibatch size is a major factor).

I want to know if there is something in pytorch or some external library for pytroch that can allow that. And whether that sollution is available for windows or not as I do not use linux (unless maybe if it is the only way and no alternatives can be found).

ptrblck · October 28, 2020, 11:01am

I think Microsoft released a PyTorch package some time ago, where intermediate tensors could be pushed to the CPU temporarily to reduce the GPU memory usage.
However, I can’t remember the name at the moment and don’t know if it’s still maintained.

That being said, you could trace compute for memory via torch.utils.checkpoint.

salahelabyad · October 28, 2020, 5:11pm

I looked into torch.utils.checkpoint, it does satisfy what I needed.

I also tried searching again with different keywords and found this repository https://github.com/IBM/pytorch-large-model-support It enables memory swapping between CPU memory and GPU (similar to memory swap between CPU RAM and storage memory). I will leave the link here maybe it can help someone later on.

I will look into microsoft releases, if I happen to find anything related to the topic I’ll try to remember to update the post.

Thanks for the help

ptrblck · October 29, 2020, 1:26am

Ah no, you were right. It was indeed the linked repository by IBM, I just misremembered it which also explains why I couldn’t find it. Thanks for the link.

Based on the last commits it seems that PyTorch 1.5.0 is at least supported.

m-yahya-khattak · March 3, 2022, 2:12pm

I’m having the exact same issue. But I can’t figure out how to setup this module(pytorch-large-model-support) with my with my code.