First call to `torch.Tensor([5]).cuda()` hangs for a long time

I’m running cuda 7.5 on ubuntu 16.04. For some reason it hangs on the first call to .cuda() for a few minutes. I’ve looked at some of the other related posts, but haven’t found anything which helps me diagnose the issue. Would anyone be able to give me some pointers? Is there any extra information which I could provide which would be useful?

Thanks!
Ben

Some more info which may be helpful:

I’m running a GTX 1060 (Pascal) with cuda 7.5. I’ve noticed that the first call to .cuda() causes GPU memory usage to rise to about 200MB before returning. Also, python maxes out a CPU core and eats about a gig of RAM until it returns. Don’t know if it makes a difference, but I installed PyTorch via conda.

Another note:

All of these issues have been present while using Python 2.7. I upgraded PyTorch and Cuda to Cuda 8.0 and the issues persist, but it works fine with Python 3.5.

1060 needs CUDA 8.0 atleast to not invoke CUDA’s JIT compilation of kernels. If you use the CUDA 8.0 packages you should be good (regardless of the python version)

Thank you! That helps.