Setting up PyTorch environment w/Win10, Cuda

cdrouin · July 14, 2019, 9:20pm

Hi, all. I’m trying to get PyTorch up and running locally on a Win10 laptop, and I’ve been having a fair bit of difficulty; everything crashes and burns when I hit a call to torch._C._cuda_init() with a rather unhelpful runtime error (RuntimeError: CUDA error: unknown error). torch.cuda.is_available() returns true, and I am able to set the device to “cuda:0” .

Here’s how my laptop is currently set up:

GPU: GeForce GTX 1050ti (Max-Q variant)
GPU driver version: 431.36
CUDA Version: 10.1 (I’ve installed the 10.0 archival version from nVidia’s site, but 10.1 shows up when I run nvidia-smi. I did originally start with the 10.1 installer, but tried to uninstall all the components; is there something I need to do to get a clean uninstall?)
PyTorch + torchvision were installed via conda using conda install pytorch torchvision cudatoolkit=10.0 -c pytorch ; pytorch shows as version 1.1.0 with build py3.7_cuda100_cudnn7_1 .

I tried uninstalling my drivers entirely and just installing the CUDA 10.0 toolkit, but on doing so nvidia-smi reported that it couldn’t communicate with drivers. Do I need to try to roll back to an earlier GPU driver version? Is there something in particular I need to do to go vestiges of the CUDA 10.1 toolkit off of my machine? Should I take off and nuke it all from orbit?

ptrblck · July 14, 2019, 10:43pm

Could you try to install the latest pytorch nightly build as suggested in this issue?
Let us know, if that doesn’t help and you still get this error.

cdrouin · July 14, 2019, 11:41pm

Hm. I uninstalled the stable version and installed the nightly, but now things appear to be broken further. The torch module seems to be mostly empty when I inspect it with help('torch'), showing only the submodules nn and utils - no cuda, which means that the second we hit a torch.cuda reference it falls over.

If it helps, I installed torch-nightly via conda install pytorch-nightly cudatoolkit=10.0 -c pytorch , and it installed pytorch/win-64::pytorch-nightly-1.2.0.dev20190714-py3.7_cuda100_cudnn7_0.

Having said that, the nightly that was mentioned as working in that thread is from February. I’m going to see if I can pull an older version of pytorch (either one of the older nightlies or 1.0.0, which was mentioned to be working in that thread) and see where that takes me.

edit: It’s a hack, but making a call to torch.cuda.current_device() (as mentioned in that thread) appears to resolve this issue.

ptrblck · July 15, 2019, 8:57am

Without this line of code you cannot call any CUDA functions without raising an error?

CC @peterjc123: could this be related to the linked issue (which should have been resolved)?

peterjc123 · July 15, 2019, 9:36am

I don’t think an incomplete python package will get uploaded. We will run some basic smoke tests before uploading these packages. Apparently, importing torch.cuda is one of them. Also, from the size of the package, it is normal, which is around ~500MB. However, I’ll check it later. As for the problem, have you completely removed the old installation? What if you do conda uninstall pytorch, pip uninstall torch, conda uninstall pytorch-nightly and pip uninstall torch-nightly in a row and then install it again? Also, would you please check if there is any pytorch installation in your PYTHONPATH?

cdrouin · July 15, 2019, 1:44pm

I was able to call torch.cuda.is_available() and torch.device() (not strictly a CUDA function, I think, but was using it to set the device to “cuda:0”) without anything blowing up.

@peterjc123 , I’d uninstalled pytorch + torchvision + pytorch-nightly before attempting the (re)install operation. Note that I’m using conda, not pip, in case that has anything to do with the issue. To get torchvision (re)installed on top of pytorch-nightly, I had to use the --no-deps flag since it otherwise requires pytorch to be install.

It appears I don’t actually have a PYTHONPATH environment variable on this system (FWIW - installed using the Anaconda graphical installer); I don’t see pytorch referenced in the normal PATH variable either.

peterjc123 · July 16, 2019, 1:17am

Yes, I know. I just want to ensure torch is uninstalled. Using pip uninstall is harmless since it will ask for your confirmation. If you ensure the package is completely removed, then it is likely that the package you downloaded is incomplete or broken. Please remove the cache file in [Anaconda Root]\pkgs and try again. Alternatively, you can download the file from https://anaconda.org/pytorch/pytorch-nightly/files and install it locally.