Performance on 2080TI + CUDA10.0 + CUDNN7.3.1

Hello, I’ve just installed a new machine with two 2080ti.
Comparing with my another machine with two 1080ti (all other components like CPU and mainboard are identical), my codes are 50% slower on 2080ti. Configurations of the machines are:

[Machine #1]: two 2080ti + CUDA10.0 + CUDNN7.3.1, python 3.7 (Anaconda)

  • I skipped “conda install -c pytorch magma-cuda92” since maga for cuda10 is not available yet.

[Machine #2]: two 1080ti + CUDA 9.2 + CUDNN7.3, python 3.6.5 (Anaconda)

Does anyone have a data of PyTorch on 2080ti?

Thank you.

How did you install PyTorch? Did you build from source? A couple fixes for CUDA 10 is not in release yet.

I built the source (latest one from github).

BTW, it seems to be due to the overheating of my device 0, which is too closely located to device 1.
DataParallel makes the device 1 wait for the device 0 of which power is reduced.
When I test the device 1 only, it seems to be faster than 1080ti single.

I have no idea how to handle this overheating. :frowning:

Thank you!

Did you solve your issue, ie were you able to train with 2 RTX at a decent speed?

Ah, I had to change the mainboard that supports 4 pcie 16x slots.
Then I put the cards to maximize space between them: [device0] [empty] [device1] [empty].
Then it seems to be ok.

When I installed like [device0][device1], the device0 became too hot (within 5 minutes) and throttled down.

It was ok when I used 1080. Similar slow-down for 1080ti (although I was not aware because it was not as clear and frequent as 2080ti).

Could you use cuda9.0 on 2080ti?
Could you use pytorch0.3.0 on 2080ti?

well, I have not tried, but I heard 2080ti needs CUDA 10.0, doesn’t it?

The 2080 Ti will run with Cuda 9.2.

However I am seeing similar slowdowns with 2x 2080Ti using data.parallel on Cuda 9.2. Single card 2080Ti performance is faster than a single 1080Ti. In my case it does not to appear to be a heat issue, as I have two blower 2080Tis with temps always <68c.

I found that when using an architecture that is more rare, it is really important to set torch.backends.cudnn.benchmark = True
This made training run 5x faster in my older Maxwell Titan and it could also apply to the Turing cards.