I have encountered an odd problem:
My lab has a server with four 1080Ti GPU(about 12G)，and it’s used by multiusers. I have installed CUDA9.0, cuDNN7.4.3 for CUDA9.0, pytorch 0.4.1。
When I create a random tensor, neither small size of big size, when it’s print or copy, it occur an error, which say “cuda runtime error:out of memory”. I check the use of GPU, it’s not utilized and enough for such a tensor.
create a tensor:
print this tensor:
Then I think it may be the NVIDIA’s cuda problem, so I reinstall cuda of the same version, and cudnn. And then it works. I can successfully print it and copy it.
Then I go to take a break, and when I go back to do the same thing, it fail. It occur the same problem as previous. The thing I need to add is, after I reinstall cuda and cudnn, my friend begin to run her tensorflow code with GPU0(I don’t know if it will affect).
In addition, when I create the tensor and make it to GPU device, it really occupy the GPU usage, but when print or copy, it failed.For example, like the code I show above, I create a 10x10 size of tensor and make it to GPU2，then the GPU usage is as follow, it also seems wired, because it occupies too much.