Torch.cuda.device_count() shows only one gpu in MIG setting

Hi, I have an a100 machine and the configuration for MIG is following.

GPU 0: A100-SXM4-40GB (UUID: GPU-b428bd3e-1cd2-38b1-833a-bae2ac1edf60)
  MIG 7g.40gb Device 0: (UUID: MIG-GPU-b428bd3e-1cd2-38b1-833a-bae2ac1edf60/0/0)
GPU 1: A100-SXM4-40GB (UUID: GPU-67129a8b-fff4-5944-eada-c4f04ef3871b)
  MIG 7g.40gb Device 0: (UUID: MIG-GPU-67129a8b-fff4-5944-eada-c4f04ef3871b/0/0)
GPU 2: A100-SXM4-40GB (UUID: GPU-59efe21f-857a-87de-3d87-74527465b26a)
  MIG 7g.40gb Device 0: (UUID: MIG-GPU-59efe21f-857a-87de-3d87-74527465b26a/0/0)
GPU 3: A100-SXM4-40GB (UUID: GPU-0723161b-142b-001e-e2e5-ba267d31ba0f)
  MIG 7g.40gb Device 0: (UUID: MIG-GPU-0723161b-142b-001e-e2e5-ba267d31ba0f/0/0)
GPU 4: A100-SXM4-40GB (UUID: GPU-0973e5f1-5abc-26c7-d298-baf3bf29c8d0)
GPU 5: A100-SXM4-40GB (UUID: GPU-bfa1ca29-5339-06d7-b9bb-ea503e10f793)
GPU 6: A100-SXM4-40GB (UUID: GPU-9b67e30c-1635-be44-9015-aa650b1a6342)
GPU 7: A100-SXM4-40GB (UUID: GPU-d52b7cce-5485-136e-3f6d-c8184f40bdef)

and torch.cuda.device_count() shows only one gpu. I need exact world_size to use DDP. How could I solve it?

1 Like

This is expected behavior, as MIG doesn’t support multiple compute instances subscription for processes.
If no GPUs are in MIG mode, all devices should be visible. Otherwise, if one of the devices is in MIG mode, the GPUs in non-MIG are invisible.