DDP training on RTX 4090 (ADA, cu118)

FWIW, I upgraded my system from dual 3090(without NVLINK) to dual 4090 and now I’m seeing these issues. I have tried all of the possible workarounds (NCCL_P2P_DISABLE, BIOS settings). None of these were needed for 3090 operation.

A Nvidia threads posted above has an update that Nvidia has reproduced and are investigating. Standard nVidia CUDA tests fail with dual RTX 4090 Linux box
Additionally Intel platforms are seeing the same issue, and tensorflow.