RTX 2080 Ti slower than GTX 1080 Ti

Weihao_Yuan · September 30, 2019, 2:26am

I run my code on GTX 1080 Ti and RTX 2080 Ti and found 2080 is slower than 1080. The time for one batch of 1080 is around 1.9s while the time for 2080 is around 3.3s. I tried torch==1.2.0 and torch==1.1.0 and torchvision==0.4.0 and torchvision==0.2.0 but the results are similar. I do not understand. Anyone can help me?

LeviViana · September 30, 2019, 8:59am

Could you provide a snippet reproducing your issue ?

Weihao_Yuan · September 30, 2019, 12:07pm

You can use this repo. https://github.com/JiaRenChang/PSMNet.git
My code is based on it.

LeviViana · September 30, 2019, 2:07pm

If you really want to get help, it would be better to provide a precise script reproducing this effect. Personally, I won’t go and and do it for you, and as far as I know this community, and unless you are lucky, people won’t do it neither.

Weihao_Yuan · October 3, 2019, 10:37am

You can run

python Test_img.py --loadmodel (finetuned PSMNet) --leftimg ./left.png --rightimg ./right.png --isgray False

in this repo.

Oli · October 3, 2019, 10:56am

The Test_img.py script only does the calculations on two images once. It also measures the time it takes to transfer the images from CPU -> GPU -> CPU. Since there is so little data, the time measured might be dominated by this transferring time and not the actual model computations. You might actually run the script faster on the CPU.

What is the goal if you don’t mind me asking? To run this script as fast as possible? Train or run with webcam? Verify that your 2080 is faster than 1080?

Weihao_Yuan · October 3, 2019, 11:39am

Thanks for your replies. My goal is training this network using different data. And I found it runs faster on 1080 than 2080. I guess there is something wrong so that the 2080 does not perform well. I hope to find the reason so that I can train the network faster using my 2080. You can train it using main.py or finetune.py in this repo. Thanks.

Oli · October 3, 2019, 2:10pm

I don’t have both a 1080 and a 2080 to help you try these things. A good start would be to manually monitor the output of nvidia-smi for the two GPUs while training and see if the Volatile GPU-Util percentage is good.

This might be of use to update the output

watch -n 0.2 nvidia-smi

Weihao_Yuan · October 4, 2019, 12:29am

I didn’t find anything different between the nvidia-smi outputs of 1080 and 2080

iidsample · October 4, 2019, 2:35am

Are these on same machine. Are you using one of these to power your graphics also. If this is the case then this might happen. In whatever experiments I have ran 2080 seems to be almost as fast in each and every case.

Weihao_Yuan · October 4, 2019, 7:54am

They are on different machine but the cpu and memory are the same. And I tried on two sets of machine and the results are similar. I mean, cpu1 + 2080, cpu1 + 1080, cpu2 + 2080, cpu2 +1080. So I wonder if there is some code not supported well by pytorch with 2080.

iidsample · October 4, 2019, 1:08pm

Try using Profiler might give you the idea of what is happening.

rwightman · October 4, 2019, 7:41pm

As mentioned earlier, I’d look at details of the PCIe bus/speed, after that I’d look at temperature throttling.

Compare PCI details from
nvidia-smi -q

Compare output of
nvidia-smi dmon -s pucte while running the code on 1080 vs 2080

Joe1Chief · October 6, 2019, 2:02pm

I have faced the same issue(10s/iter on 2080ti vs 2.5s/iter on 1080ti) in my practice. And I have tested on many platforms(including Ubuntu 16.04/18.04, Windows 10). It’s definitely not a hardware failure, but the real reasons aren’t clear(maybe its Nvidia’s fault). There are some valuable suggestions.

Modifying your network architecture.
Apex is helpful.

henrique · October 7, 2019, 10:52am

Yeah, it seems strange from the specs, however, new cards are optimized mostly for half precision.
I’d try using Apex AMP, as Joe1 pointed out above.

soad89 · February 15, 2020, 8:28am

Hi Joe1, I came into comparable problems, running pytroch model RTX2070 2x faster then TITAN RTX on same system, do you think it still can be the same problem even they are both Turing? Do you have any new findings regarding that?

Joe1Chief · February 20, 2020, 9:26am

Sorry, I didn’t find an efficient way. Rewriting your code may be helpful.