Train multiple models on multiple GPUs

MPI is not necessary here, torch.distributed package now provides MPI style and rpc style distributed apis. Moreover it also supports gloo mpi and nccl backends (MPI style only), so if you don’t want more hassles, they should be sufficient.

1 Like