Issue in allocating a large tensor to GPU

Peishi · July 13, 2021, 5:07am

Hello,

I’m new to pytorch, and have an issue with assigning a large tensor to GPU. The memory of the tensor is larger than the GPU 16GB memory (and I have multiple such tensors for training/val/test). I suppose it should be a common issue that has been well resolved by many others. Any suggestion?

I read a related post here, which hasn’t been answered.

Thanks in advance!

ptrblck · July 15, 2021, 6:05am

You could try to use a model parallel approach (i.e. splitting the operation into separate parts) to save memory, as you won’t be able to allocate more device memory than is available.
The implementation depends on the actual operation.

Peishi · July 20, 2021, 9:53pm

Thank you for the reply! The problem is solved by simply feeding the GPU with each batch during training, instead of sending all data together. But the model parallel approach looks also interesting. Is there any pytorch tutorial for that?

ptrblck · July 21, 2021, 2:01am

Here is a tutorial for a simple model parallelism.