Model.cuda() takes long time

Even for a small model, calling its cuda() method takes minutes to finish. Is it normal? If yes what it does behind the scene that takes so long?

Thanks!

4 Likes

We’re aware that there are problems with binary packages we’ve been shipping, that only appear on certain GPU architectures. Would you mind sharing what’s your GPU? I believe this has been already fixed, so a reinstall might help you.

Ping @smth

I upgraded pytorch from 0.1.6-py35_19 to 0.1.6-py35_22. It still took 2.5 minutes to run my_small_model.cuda(). I am using Conda Python 3.5 env, GTX 1080 with cuda 8.0.

Thanks for your quick reply!

i will update back when this is fixed. I hope to get a fix out today or tomorrow.

Thank you @smth, Looking forward to the fix!

I am using Conda Python 2.7 env, GTX 1080 with cuda 8.0 too. And pytorch was installed by binary packages. Even when learning, it was much slower than lua’s torch.nn. Is this problem related to my GPU? Or is it just a matter of my code?

Thanks.

During learning, there is a part where data calls cuda (). Does data.cuda() have same problem?

Thanks

This should be fixed now if you use the command (website is updated):

conda install pytorch torchvision cuda80 -c soumith
4 Likes

Tested the new release and it works great. Thanks a lot for the super responsive fix!

This problem raises again in my GPU server. I was using Titan x, torch-0.1.11.post5-cp27,cuda8.0, install from pip using whl.
But when I uninstall it and reinstall with torch-0.1.10.post2, it worked fine.

@chenyuntc check the latest comments I posted on https://github.com/pytorch/pytorch/issues/537, maybe that’s the issue

Hello I just encountered this problem as well. I am running pytorch on a cluster. Here is the information about the sysmtem and code I am running:

  1. Python 2.7 and CUDA 7.5 with latest pytorch.
  2. GPU I am using is 1080 Ti.
  3. My model is a simple bidirectional layered GRU followed by two linear layers. The Model.cuda() can take about 10 minutes. I am not sure if this is normal.

Thanks in advance for any help!!!

I don’t want to hijack the thread, but at least the title fits.

The case here is that the home fold where my script lives is NFS mounted from another server. When comes to model.cuda(), pytorch takes some time to move something over the NFS link.

But only my script is on NFS. Both pytorch under conda and the data is on local disk.

So I guess when pytorch compile those cuda source file it uses the same directory where the script lives in or user’s home directory?
(because I run the script in local directory, model.cuda() still takes some time, thus pytorch don’t use working directory?)

The question is is there a way(or simple modification) to ask pytorch to use a local path to do compiling work if all the hypothesis is correct?

@jtang10 if you are using 1080Ti you have to use CUDA 8.0 (and install pytorch with CUDA 8.0 version). otherwise the startup time will be very slow because CUDA 7.5 cannot support 1080Ti by default.

2 Likes

Does CUDA 9.0 have a similar issue? I seem to have the same problem.
I’m using 1080Ti, CUDA 9.0, CUDNN 7003. I’m installing PyTorch from source initially, but when I tried to install torchvision using conda, my PyTorch installation might be overridden by conda. When I try conda list, I can see both

torch                     0.2.0+59e0472             <pip>
pytorch                   0.2.0           py36hf0d2509_4cu75    soumith

Running a simple script like this gives me the following results:

import torch
from datetime import datetime

for i in range(10):
    x = torch.randn(3, 4)
    t1 = datetime.now()
    x.cuda()
    print(i, datetime.now() - t1)
0 0:06:24.108245
1 0:00:00.000110
2 0:00:00.000055
3 0:00:00.000048
4 0:00:00.000046
5 0:00:00.000046
6 0:00:00.000044
7 0:00:00.000044
8 0:00:00.000044
9 0:00:00.000044

Sorry that I’m new to PyTorch so I might be doing something incorrectly. Thanks in advance.

In my experience installing torchvision after installing conda overrides the pytorch source install. However you can re-install pytorch from source and you’ll be using the latest pytorch.

@Cognac what is your output of

import torch
torch.__version__
torch.version.cuda
torch.version.cudnn

?

Thanks for the immediate response!

>>> import torch
>>> torch.__version__
'0.2.0_4'
>>> torch.version.cuda
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: module 'torch.version' has no attribute 'cuda'
>>> torch.version.cudnn
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: module 'torch.version' has no attribute 'cudnn'

I’ve just edited my previous post to add the information from conda list. For some reason I can see both pytorch installed from conda and torch from compiling the source. Am I using the wrong version?

Interesting. You’re definitely not on pytorch master, you’re on the version that comes with conda.

If you want to build from source again (to see if your problem will go away on master, but I don’t know if it will), you can try the following: (I’m assuming you have the pytorch source code somewhere):

pip uninstall torch 
pip uninstall torch # yes, this is intentional
cd pytorch
python setup.py install

Thanks, I’ll try it now. So after pip uninstall torch twice, should both torch and pytorch be removed from conda list? Do I need to remove pytorch as well?

I’m trying to get torch removed from your pip list. I think you’ll still see pytorch installed via conda – you shouldn’t remove that because I believe removing that will uninstall torchvision as well.