CUDA out of memory error for large data sizes for SVD in GPU

I run into a cuda out of memory error when I increase the data size for torch svd cuda.

THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1503970438496/work/torch/lib/THC/generic/THCStorage.cu 
line=66 error=2 : out of memory
Traceback (most recent call last):
    File "svd-gpu.py", line 9, in <module>
        u, s, v = torch.svd(x, some=True);
RuntimeError: cuda runtime error (2) : out of memory at /opt/conda/conda-
bld/pytorch_1503970438496/work/torch/lib/THC/generic/THCStorage.cu:66

torch.svd(x) works well for x=3M300 but fails for x=3.5M300.

I have a total of 8 Tesla K80 GPUs each having 12GB memory. Having access to all 8 GPUs doesn’t seem to make a difference so I tried an experiment as follows.

I set CUDA_VISIBLE_DEVICES to 0 and then nvidia-smi shows 11392MiB (out of 11439MiB) memory consumed for GPU 0 and the program fails. This I understand.

I set CUDA_VISIBLE_DEVICES to 0,1 and then nvidia-smi shows 8245MiB (out of 11439MiB) memory consumed for GPU 0 and around 2 MiB (out of 11439MiB) memory consumed for GPU 1 and the program fails.

Does having more GPUs or memory have no effect on svd? how does the calculation happen? Regardless of having more GPUs is only 1 GPU used to calculate svd in PyTorch?

combining GPUs automatically is not easy to do. PyTorch does not do this implicitly.
Hence, you see that regardless of how many GPUs you have, the SVD is running only one one GPU and going out of memory.

Is there be a workaround/method to make SVD use all of the existing GPUs?

the only quick workaround i can think of is to move the Tensor to CPU and do SVD on CPU.

I am not looking for a way to calculate SVD for a large data set. I’m specifically looking for a GPU based implementation of SVD that scales across multiple GPUs.

But Thank you for the clarification.