Dual 4090 VS Dual 3090 VS single 4090

Hi everyone!
This is my first post here and hope this is not in a wrong place(Please transfer it if it is)

My first plan was building an AI PC using 2 RTX 4090 GPUs. But I discovered that NVlink is not available in this generation and P2P isn’t supported for this card in PyTorch.
On the other hand RTX 3090 has NVlink available and PyTorch can detect both 3090 GPUs as a larger 48GB GPU, and this is a big advantage for a dual 3090 cards build for working on large models.
Moreover some people believe a single 4090 setup is still faster/more powerful than a dual 3090s(I don’t know why?).

So I am confused what is the best consumer AI setup? My goal is to build the most powerful home AI computer for my own projects in the next 3~5 years, by choosing one of the following options:

  1. A dual RTX 4090 build
  2. A dual 3090 Build
  3. A single 4090 build

I like to run Stable Video Diffusion, Tortoise TTS, Falcon 7B LLM, OpenAI Whisper, etc. and be able to train(or at least fine tune) them in my local computer at the fastest speed. Please help me to get to my final decision!

I am not sure if a model needs 16GB of VRAM to be able to run on a computer, how much VRAM is needed to train or fine-tune such a model?

I would just like to clarify that dual 3090s would have to be used as two GPUs—while PyTorch can use multiple GPUs efficiently with DDP, FSDP, etc., multi-GPU support is not fully transparent at the framework level (two GPUs would not appear as a single GPU even with NVLink).

Thank you for the clarification! But I read other topics in the forum talking about multiple 4090 setups having problem with PyTorch because the lack of P2P support in 4000 series cards. Is that resolved by now?

Thank you for the clarification! But I read other topics in the forum talking about multiple 4090 setups having problem with PyTorch because the lack of P2P support in 4000 series cards. Is that resolved by now?

The driver issue wrongfully reporting a supported p2p access is resolved, yes.

1 Like

So do Pytorch work with dual RTX 4090s as like as dual RTX 3090s? I mean in terms of recognizing multiple GPUs and taking advantage of maximum power/speed in parallel computing/processing?

Based on the topic here DDP training on RTX 4090 (ADA, cu118) - #25 by thefreeman it seems we can not taking advantage of using two 4090 cards in PyTorch?

Yes they have fixed the error but the problem is still there " Device 1 is not peer capable with some other selected peers, skipping"

You can use multi 4090 without p2p access.

Doesn’t it affect speed not being able to use p2p?
Or is there any other disadvantages using a dual 4090s in comparison with a dual 3090s utlizing NVlink?

I’m getting 2x4090 soon, I’ll try run them in 16x4.0 PCIE both, probably will try some benchmarks regarding communication speed [is speedup in training linear or not, I won’t have any CPU bottlenecks]

2 Likes

Hi Dazzle! Any news about communication speed between your two 4090s?