GPU computation of large array

Hi Everyone,

I converting a 10GB CT scanning images to an array and then transformed to torch tensor. I intend to do some mathematical operation with GPU parallelization on this big tensor. It is too large to move to the GPU memory. Is there any method to do this operation in Pytorch?

Thanks, everyone.

It depends on the available memory of your GPU.
If you have a newer GPU with 16 or 32GB, some operations might be possible on this tensor.
Depending on the operation, you might need to create an output tensor, which could take the same amount of memory.

Which operations are you interested in?

Thanks for the kind reply. My workstation only has 4GB GPU memory. That is the problem.
I wonder if the data can be read and processed in GPU piece by piece. So not all the data is needed to be imported into the GPU memory. Is this function available in pytorch?

Thanks a lot.

If you could split the operation into chunks of data, it’s possible.
E.g. if you would like to sum a tensor, you could slice the original tensor, push this slice to the device, sum the values, and transfer the result back to the host.
Once all slices are done, you could sum all temporary results.

Note that you add overhead due to the host2device and device2host copies and depending on your operation and workload, it might be faster on the CPU, so you should profile both approaches.

Thanks for the advice. I’ll give it a try.