Is it possible to load a tensor larger than the capacity of a single GPU memory?


Say I have 2 GPU cores with 8GB memory each, then I can use nn.DataParallel and set device_ids=[0, 1] to execute on multiple-GPUs.

In which case I can load my_tensor to GPU using my_tensor.cuda(0) - as by convention, all tensors must be in the first GPU of device_ids.

This works as long as my_tensor < 8GB and Out-of-Memory exception otherwise. Is there a way to load tensors > 8GB (i.e. larger than a single GPU memory, say my_tensor ~ 12 GB ) such that it can be shared between the 2 GPUs’ memory?

In other words, if I have to forward pass 2 tensors of 8GB each to the model, I get OOM exception when I try to do it the second time as model(my_tensor2.cuda(0)).