Thanks to the DDP, I could split the batch data across different GPUs on Different nodes, I could also split the model on different GPUs in one node. But for now, I need to split the model on different GPUs on Diffrent Nodes, let’s say two nodes, and one gpu per node. Could someone help me with this? I think the main difficult is that: for each node, the “local_rank” is 0, how to send different part of the model to different GPU on different node?
Thanks a lot!
Unfortunately cross-host model sharding is not supported yet, but we have plans to introduce it in a future version of PyTorch.
Thanks for your quick response, Can!