DDP training on RTX 4090 (ADA, cu118)

I was able to reproduce the issue reported by @iafoss on my multi 4090 system with AMD EPYC CPU (driver 525).

FWIW, I upgraded my system from dual 3090(without NVLINK) to dual 4090 and now I’m seeing these issues. I have tried all of the possible workarounds (NCCL_P2P_DISABLE, BIOS settings). None of these were needed for 3090 operation.

A Nvidia threads posted above has an update that Nvidia has reproduced and are investigating. Standard nVidia CUDA tests fail with dual RTX 4090 Linux box
Additionally Intel platforms are seeing the same issue, and tensorflow.

nVidia just updated that thread (Standard nVidia CUDA tests fail with dual RTX 4090 Linux box - #16 by abchauhan - Linux - NVIDIA Developer Forums)

… Feedback from Engineering is that Peer to Peer is not supported on 4090. The applications/driver should not report this configuration as peer to peer capable. The reporting is being fixed and future drivers will report the following instead… (in the simplep2p test)…

Peer to Peer access is not available amongst GPUs in the system, waiving test.

II. ./streamOrderedAllocationIPC
Device 1 is not peer capable with some other selected peers, skipping

Thats a bummer for many who bought multi-4090 gpus for ML.

1 Like

So is that mean if you need to work models smaller than 24GB, go for a single 4090 build, if you want to work with bigger models, go for dual 3090s (Considering someone wants to build an AI system based on one of these GPUs) like the following link
Dual 4090 VS Dual 3090 VS single 4090 - #4 by thefreeman )

@greg_warzecha Is the nvidia p2p issue for multiple 4090 fixed with the newer driver? Are you getting the bandwidth