I want to train a large network using model parallelism on multiple machines (multiple GPUs per machine),
for that I am following this article
https://pytorch.org/tutorials/intermediate/ddp_tutorial.html#combine-ddp-with-model-parallelism
This article doesn’t set up any multi machine cluster, so how will it train on multiple machines? Also I am not able to understand following terms in my scenario,
world size
rank
spawn
processes
process group
I have already installed NCCL in all nodes. How can I make it work?