Hi everyone!
This is my first post here and hope this is not in a wrong place(Please transfer it if it is)
My first plan was building an AI PC using 2 RTX 4090 GPUs. But I discovered that NVlink is not available in this generation and P2P isn’t supported for this card in PyTorch.
On the other hand RTX 3090 has NVlink available and PyTorch can detect both 3090 GPUs as a larger 48GB GPU, and this is a big advantage for a dual 3090 cards build for working on large models.
Moreover some people believe a single 4090 setup is still faster/more powerful than a dual 3090s(I don’t know why?).
So I am confused what is the best consumer AI setup? My goal is to build the most powerful home AI computer for my own projects in the next 3~5 years, by choosing one of the following options:
A dual RTX 4090 build
A dual 3090 Build
A single 4090 build
I like to run Stable Video Diffusion, Tortoise TTS, Falcon 7B LLM, OpenAI Whisper, etc. and be able to train(or at least fine tune) them in my local computer at the fastest speed. Please help me to get to my final decision!
I am not sure if a model needs 16GB of VRAM to be able to run on a computer, how much VRAM is needed to train or fine-tune such a model?
I would just like to clarify that dual 3090s would have to be used as two GPUs—while PyTorch can use multiple GPUs efficiently with DDP, FSDP, etc., multi-GPU support is not fully transparent at the framework level (two GPUs would not appear as a single GPU even with NVLink).
Thank you for the clarification! But I read other topics in the forum talking about multiple 4090 setups having problem with PyTorch because the lack of P2P support in 4000 series cards. Is that resolved by now?
Thank you for the clarification! But I read other topics in the forum talking about multiple 4090 setups having problem with PyTorch because the lack of P2P support in 4000 series cards. Is that resolved by now?
So do Pytorch work with dual RTX 4090s as like as dual RTX 3090s? I mean in terms of recognizing multiple GPUs and taking advantage of maximum power/speed in parallel computing/processing?
Doesn’t it affect speed not being able to use p2p?
Or is there any other disadvantages using a dual 4090s in comparison with a dual 3090s utlizing NVlink?
I’m getting 2x4090 soon, I’ll try run them in 16x4.0 PCIE both, probably will try some benchmarks regarding communication speed [is speedup in training linear or not, I won’t have any CPU bottlenecks]
The RTX 4090 outperforms dual RTX 3090s in most AI tasks, offering better performance per watt and more VRAM (24GB vs. 48GB combined). However, dual RTX 3090s provide more raw GPU power but are limited by higher energy consumption and potential scaling inefficiencies. The 4090 offers better overall efficiency for single-GPU workloads.
Recently received a custom build workstation with RTX 4090 from this Custom PC builder ProX PC for my AI projects