Hi, I have an a100 machine and the configuration for MIG is following.
GPU 0: A100-SXM4-40GB (UUID: GPU-b428bd3e-1cd2-38b1-833a-bae2ac1edf60)
MIG 7g.40gb Device 0: (UUID: MIG-GPU-b428bd3e-1cd2-38b1-833a-bae2ac1edf60/0/0)
GPU 1: A100-SXM4-40GB (UUID: GPU-67129a8b-fff4-5944-eada-c4f04ef3871b)
MIG 7g.40gb Device 0: (UUID: MIG-GPU-67129a8b-fff4-5944-eada-c4f04ef3871b/0/0)
GPU 2: A100-SXM4-40GB (UUID: GPU-59efe21f-857a-87de-3d87-74527465b26a)
MIG 7g.40gb Device 0: (UUID: MIG-GPU-59efe21f-857a-87de-3d87-74527465b26a/0/0)
GPU 3: A100-SXM4-40GB (UUID: GPU-0723161b-142b-001e-e2e5-ba267d31ba0f)
MIG 7g.40gb Device 0: (UUID: MIG-GPU-0723161b-142b-001e-e2e5-ba267d31ba0f/0/0)
GPU 4: A100-SXM4-40GB (UUID: GPU-0973e5f1-5abc-26c7-d298-baf3bf29c8d0)
GPU 5: A100-SXM4-40GB (UUID: GPU-bfa1ca29-5339-06d7-b9bb-ea503e10f793)
GPU 6: A100-SXM4-40GB (UUID: GPU-9b67e30c-1635-be44-9015-aa650b1a6342)
GPU 7: A100-SXM4-40GB (UUID: GPU-d52b7cce-5485-136e-3f6d-c8184f40bdef)
and torch.cuda.device_count()
shows only one gpu. I need exact world_size
to use DDP. How could I solve it?