Hi team.
I want to confirm the expected behavior of torch.cuda.device_count()
on a multi-node environment. Should it return the device count on a single node or the total device count across the nodes?
Let’s say I have 2 GPU VMs and each of which has 4 GPU devices. If I run torch.cuda.device_count()
in a multi-node way (such as using torchrun), I observed that the command on each terminal returns 4. That means the torch.cuda.device_count()
returns the device count on a single node instead of across the node. Is it expected?
Thanks!