RTX 3090 6 times slower than GTX 1080ti

Hi Guys,

I hope you are doing great, I am well aware of all the github issues (RTX3090 performs no better than 1080ti · Issue #50328 · pytorch/pytorch · GitHub, Anyone has use it in 3090? It seems been slow... · Issue #276 · traveller59/spconv · GitHub, etc) about this topic.

I’ve tried a lot of different containers from nvidia and my RTX 3090 is just so much slower than my 1080ti when performing forward 3x3 convs. What is going one?

Does anyone know how to fix it? There is no solution in any issue.

Thanks

Could you post a minimal, executable code snippet reproducing this issue, please?

hi @ptrblck ,

I was running a vgg like moodel, the 1080ti was really faster. Now if you think it’s better to run your benchmark (form here → Convolution operations are extremely slow on RTX 30 series GPU · Issue #47039 · pytorch/pytorch · GitHub) on both, I’ll do it. If we can avoid it it would be better so I don’t have to switch GPUs.

Do you know if there is a fix for this problem? In cudnn release notes, I’ve seen that is labeled as an known issue but no fix.

It’s cray that this problem exists in the first place, the 3090 is the flagship. Do you have additional information about what is causing it?

Thanks a lot

As you see in my comment from ~1.5 years ago, the 3090 performed already better.
Unfortunately, users keep commenting with unverified claims, don’t share their model or how they profiled the code, and often don’t bother to follow up.

I still claim it’s fixed until I see a valid counter example, so feel free to share something here.

Hi @ptrblck , I had time to benchmark both 1080ti and 3090 during the weekend. Yes, the 3090 is around ~2 times faster than the 1080ti.

Code and benchmarks here → GitHub - FrancescoSaverioZuppichini/is-3090-good-for-computer-vision: A collection of benchmarks I've run

Thanks a lot and I don’t know why I thought it was slower in the first place

Thanks for sharing. You could also check the performance for mixed-precision training (in channels-last memory layout), which might also show more speedups.