Using .cuda() is very slow on pytorch=0.2.0

Due to compatibility issues, I am using pytorch=0.2.0 with python=2.7

I installed it using conda install pytorch=0.2.0 cuda80 -c soumith as it was pointed out on the forum that this will lead to reduction in lag while using .cuda() for the first time, however I do not see any improvements and loading still takes ~3mins.

(P.S I’m using tesla v100)

Has been some time since PyTorch 0.2, but I can’t remember it being that slow. It should take a few seconds at most, not minutes. Not sure what’s causing the problem in your case, could be that it’s been a bug in PyTorch 0.2?

PyTorch 0.3 and 0.4 also work with Python 2.7, regardign the compatibility issues you mentioned, the Tesla V100 should work with cuda 8 & 9, so I think so if you can install it 0.3 or 0.4 somehow alongside the 0.2 version, it would help figuring out whether it’s a PyTorch 0.2-specific bug or sth else.

I remember this issue occurred if a wrong CUDA version was installed and in the first run it’s recompiling pytorch for your GPU.
Unfortunately there is no CUDA9 for pytorch 0.2.0.

However, could you print torch.version.cuda after your first cuda run?

@rasbt ideas sound good to debug your issue!


I tried doing torch.version.cuda after the first .cuda() it seems like torch.version doesn’t have cuda attribute, it shows

Traceback (most recent call last):
File "<stdin>", line 1, in <module> 
AttributeError: 'module' object has no attribute 'cuda'

tesla v100 needs cuda 9. cuda 9 is incompatible with pytorch 0.2.0, even if you build 0.2.0 from source.

The solution is to upgrade to 0.4.0 (0.3.0 might work, but I’m like 60% 20% sure you need 0.4.0).

You’ll notice that your ~/.nv directory is probably incrasing in size without bound right?

Yeah, that was my mistake. I think the method was introduces after 0.2.0.

The same lagging issue is happening on other machine as well using GTX-1060, with python 3.5.4 and
pytorch 0.3.0 py35cuda8.0cudnn6.0_0

the output of torch.version.cuda after first .cuda() is '8.0.61'

is there any way to reduce the lag here?

In case the future reader is interested I solved it using the pytorch0.3.1 compiled with cuda 9 using

pip install

@ptrblck , greetings.

It seems that I’m facing this issue using nvidia-docker2, though that maybe you can assist to figure it out.
Particularly, may it be that the nvidia-docker2 creates such recompiling because of its design ?
Can you take a look stackoverflow issue.

It’s not a pure pytorch issue, but maybe an nvidia-docker one, but maybe from a fast look at nvidia-docker design you could make some efficient conclusion for this issue in my setup and environment.

Particularly, do you think I should try install CUDA8.0 drivers on my host and try to see if I can make the connection between those drivers and the docker container (not sure if it is possible to manage in such way drivers, but that’s another question for CUDA I guess)?

Oh, forgot my Ubuntu 20 doesn’t support CUDA8.0.
Well, anyway it could be useful if you could take a look and tell me on which end in that case does seem to be the problem.

No, it’s not a docker feature, but the CUDA compiler will jit-compile code (if possible) if an unsupported architecture is detected.

No, I would not recommend to use CUDA 8 or PyTorch 0.2, as both are now a few years old.

1 Like