Torch.cuda.device_count() behavior on multiple nodes

Hi team.

I want to confirm the expected behavior of torch.cuda.device_count() on a multi-node environment. Should it return the device count on a single node or the total device count across the nodes?

Let’s say I have 2 GPU VMs and each of which has 4 GPU devices. If I run torch.cuda.device_count() in a multi-node way (such as using torchrun), I observed that the command on each terminal returns 4. That means the torch.cuda.device_count() returns the device count on a single node instead of across the node. Is it expected?

Thanks!

Yes, that’s expected as the Python process wouldn’t have knowledge about other nodes.

1 Like

Thanks @ptrblck for confirming!