Simple code example with NVLink support

I need to test a system that has multiple GPUs, linked with NVLink. What is the minimal code I could run to show NVLink actually works?

See this simple code example - how would you change it to take advantage of NVLink?

DistributedDataParallel via NCCL would use NVLink, if available.
For a quick performance test, I would recommend to run the nccl-tests and also verify the connections between the GPUs via nvidia-smi topo -m.

1 Like

I have to actually demo PyTorch, so I’ll see if I can get DistributedDataParallel to work.

nccl-tests looks like a useful hardware / config check, I’ll give it a try.