Does torch need NCCL?

mha_93 · November 5, 2025, 2:13pm

Hey folks,
I’m starting to become more familiar with torch. I had a few questions regarding pytorch dependencies.

Q1- Does pytorch require NCCL for a single gpu? If it is mandatory, why is this the case?

Q2- My initial understanding regarding NCCL is that it is mandatory for a distributed training environment; either a single node with multiple GPUs (using NVswitch/NVLINK) or for multi-node training. How does the torch installer communicate with the GPU fabric to figure the GPU topology?

Q3- Furthermore, I would like to have a deeper understanding of the underlying dynamics between the GPU, NCCL and torch. Where should I start? Any help is much appreciated.

ptrblck · November 5, 2025, 2:24pm

No, single GPU use cases won’t use NCCL or another communication library

NCCL ships by default in our PyTorch builds supporting CUDA and thus can be used in addition to e.g. Gloo for multi-GPU workloads. If you are using an NVSwitch/NVLINK setup, I would recommend sticking to NCCL.

I don’t understand the question as the installer (I assume pip in this case?) has no responsibility in mapping the GPU topology.

The docs would be a good starter.

mha_93 · November 5, 2025, 2:47pm

Hello @ptrblck
Thanks a million for reaching out.

Regarding my second question, sorry if it was vague. I still don’t understand how torch works. I want to have a more clear understanding regarding the interplay between the physical device, the drivers and torch. And yes, I’ve used “pip” . From your answer, It’s my understanding that torch is decoupled from the physical topology of the GPUs. If so, when using torch, who is responsible for using this topology? The programmer? Sorry if its a bit convoluted.

Finally, when should I use Gloo? Multi-gpu without NCCL/NVswitch/NVlink?