I have installed pytorch on a Tesla V100. The problem is that it is way too slow; after 16 hours, not even one epoch of imagenet was completed! I believe this should be much faster in normal circumstances.
I have cuda9 installed and the driver version is NVIDIA-SMI 384.98, which I think it is good enough. I have also the most NCLL2 and the most recent version of cuDNN.
What else should I check to make sure the V100 is configured properly? And how to fix the problem?
Did you make sure that the GPU is being utilized?
Check nvidia-smi to see the usage
Yes, it was being used. I used batch size of 64, occupying about 14 Gb
Does someone have any idea?
Which network are you using?
I am using a wide resNet. But this is a general problem: if I run resnet18 on cifar100 on a Titan Xp, each epoch takes about 10 minutes. If I run the exact same code on a V100, it takes an hour, i.e. it goes 6x times slower
I guess there must be something wrong with my settings, but I can’t figure out what. In the meanwhile I have installed cuda 9.1 and driver version 387.26, but the problem persists.
What PyTorch version are you using?
I am using pytorch 0.3. At first I built it from scratch, then removed that and installed through conda. In both cases I had similar results.
Hi-It could be the case that your dataloader is taking more time than model forward or backward
Check the time to load images properly. Use time.per_counter() instead of time.time() and torch.cuda.synchronize() before you do the time reading for eg.
t0 = time.perf_counter()
ouput = model(input)
loss = criterion(ouput, target)
It is weird indeed… I can’t think of a good explanation for this. Does same thing happen to your other GPU applications?
Are you using
torch.backend.cudnn.benchmark = True?
Yes - that improved things somewhat (2x speedup), but still not working as fast as I expected.
Hi,I also encountered the same problems , Have you already solved it ?
Hi, have you solved the problem? I also encountered the same problem. My code runs quickly on a TITAN XP. After I copied to DGX-1 V100, it is very slow (about 3 times slower than TITAN XP). By the way, I have set
torch.backends.cudnn.benchmark = True.
Hi @antspy, I encounter the same problem as yours. In my case, I found loss.backward() is 5 times slower in V100 than in TITAN XP. Did you solve the problem. Could you provide any suggestions? Thank you.
My environment settings are:
- python 3.6.6
- pytorch 1.0
- cuda 10.0
- cudnn 7.4.2