I want to use cpu RAM as a swap to GPU ram to allow oversubscription. Is there a way to implement this in cuda or C++?
As a first step, I tried to replace every cudaMalloc with cudaMallocManaged and tested it with a simple gemm on a pascal GPU. I increase the size of the matrix and check when I get a memory error. Right now both cases crash at the same time, so we don’t really have any speedup. So do you have any thoughts on this?
@kouyoumin thank you so much for your post - I am trying to replicate with the most recent version of PyTorch and it looks like the spots where you edited the code are different now - any pointers as to where these changes could be made? (really excited to try this on our new NVSwitch’d A100s to see if the unified memory can truly be “one giant GPU” as NVIDIA claims with a contiguous address space…)
@mohotmoz My idea was simple: replacing cudaMalloc with cudaMallocManaged. However, there’s still a check preventing cuda op from accessing tensors on another cuda device. So I modified the logic to allow that condition.
I have updated the code for recent changes of pytorch.