I run my code on GTX 1080 Ti and RTX 2080 Ti and found 2080 is slower than 1080. The time for one batch of 1080 is around 1.9s while the time for 2080 is around 3.3s. I tried torch==1.2.0 and torch==1.1.0 and torchvision==0.4.0 and torchvision==0.2.0 but the results are similar. I do not understand. Anyone can help me?
Could you provide a snippet reproducing your issue ?
You can use this repo. https://github.com/JiaRenChang/PSMNet.git
My code is based on it.
If you really want to get help, it would be better to provide a precise script reproducing this effect. Personally, I won’t go and and do it for you, and as far as I know this community, and unless you are lucky, people won’t do it neither.
You can run
python Test_img.py --loadmodel (finetuned PSMNet) --leftimg ./left.png --rightimg ./right.png --isgray False
in this repo.
The Test_img.py script only does the calculations on two images once. It also measures the time it takes to transfer the images from CPU -> GPU -> CPU. Since there is so little data, the time measured might be dominated by this transferring time and not the actual model computations. You might actually run the script faster on the CPU.
What is the goal if you don’t mind me asking? To run this script as fast as possible? Train or run with webcam? Verify that your 2080 is faster than 1080?
Thanks for your replies. My goal is training this network using different data. And I found it runs faster on 1080 than 2080. I guess there is something wrong so that the 2080 does not perform well. I hope to find the reason so that I can train the network faster using my 2080. You can train it using main.py or finetune.py in this repo. Thanks.
I don’t have both a 1080 and a 2080 to help you try these things. A good start would be to manually monitor the output of nvidia-smi for the two GPUs while training and see if the Volatile GPU-Util percentage is good.
This might be of use to update the output
watch -n 0.2 nvidia-smi
I didn’t find anything different between the nvidia-smi outputs of 1080 and 2080
Are these on same machine. Are you using one of these to power your graphics also. If this is the case then this might happen. In whatever experiments I have ran 2080 seems to be almost as fast in each and every case.
They are on different machine but the cpu and memory are the same. And I tried on two sets of machine and the results are similar. I mean, cpu1 + 2080, cpu1 + 1080, cpu2 + 2080, cpu2 +1080. So I wonder if there is some code not supported well by pytorch with 2080.
Try using Profiler might give you the idea of what is happening.
As mentioned earlier, I’d look at details of the PCIe bus/speed, after that I’d look at temperature throttling.
Compare PCI details from
Compare output of
nvidia-smi dmon -s pucte while running the code on 1080 vs 2080
I have faced the same issue(10s/iter on 2080ti vs 2.5s/iter on 1080ti) in my practice. And I have tested on many platforms(including Ubuntu 16.04/18.04, Windows 10). It’s definitely not a hardware failure, but the real reasons aren’t clear(maybe its Nvidia’s fault). There are some valuable suggestions.
- Modifying your network architecture.
- Apex is helpful.
Yeah, it seems strange from the specs, however, new cards are optimized mostly for half precision.
I’d try using Apex AMP, as Joe1 pointed out above.
Hi Joe1, I came into comparable problems, running pytroch model RTX2070 2x faster then TITAN RTX on same system, do you think it still can be the same problem even they are both Turing? Do you have any new findings regarding that?
Sorry, I didn’t find an efficient way. Rewriting your code may be helpful.