Dual 4090 VS Dual 3090 VS single 4090

Hi everyone!
This is my first post here and hope this is not in a wrong place(Please transfer it if it is)

My first plan was building an AI PC using 2 RTX 4090 GPUs. But I discovered that NVlink is not available in this generation and P2P isn’t supported for this card in PyTorch.
On the other hand RTX 3090 has NVlink available and PyTorch can detect both 3090 GPUs as a larger 48GB GPU, and this is a big advantage for a dual 3090 cards build for working on large models.
Moreover some people believe a single 4090 setup is still faster/more powerful than a dual 3090s(I don’t know why?).

So I am confused what is the best consumer AI setup? My goal is to build the most powerful home AI computer for my own projects in the next 3~5 years, by choosing one of the following options:

  1. A dual RTX 4090 build
  2. A dual 3090 Build
  3. A single 4090 build

I like to run Stable Video Diffusion, Tortoise TTS, Falcon 7B LLM, OpenAI Whisper, etc. and be able to train(or at least fine tune) them in my local computer at the fastest speed. Please help me to get to my final decision!

I am not sure if a model needs 16GB of VRAM to be able to run on a computer, how much VRAM is needed to train or fine-tune such a model?