Emulate distributed training setup with 1 GPU

I’m writing code for distributed training on multiple GPUs. However, my local devbox has only 1 GPU. Is it possible to emulate a multi-GPU setup on a 1-GPU devbox to test the code (esp. the parts that have collective communications) locally?

Hey @justinliu, you can use gloo which is a cpu backend (Distributed communication package - torch.distributed — PyTorch master documentation) which supports a lot of the same collectives as nccl. Does that work?

There is currently no way to virtualize multiple gpus from 1 gpu in pytorch. Some nvidia gpus support MIG (NVIDIA Multi-Instance GPU(MIG)) however the collective communications in nccl do not support this

1 Like

Got it. This makes sense. Thank you!