I want to confirm the expected behavior of
torch.cuda.device_count() on a multi-node environment. Should it return the device count on a single node or the total device count across the nodes?
Let’s say I have 2 GPU VMs and each of which has 4 GPU devices. If I run
torch.cuda.device_count() in a multi-node way (such as using torchrun), I observed that the command on each terminal returns 4. That means the
torch.cuda.device_count() returns the device count on a single node instead of across the node. Is it expected?