Use DDP to split the model on two nodes (each node has one GPU)

Knight_Zhang · June 10, 2021, 4:33pm

Hello Guys,
Thanks to the DDP, I could split the batch data across different GPUs on Different nodes, I could also split the model on different GPUs in one node. But for now, I need to split the model on different GPUs on Diffrent Nodes, let’s say two nodes, and one gpu per node. Could someone help me with this? I think the main difficult is that: for each node, the “local_rank” is 0, how to send different part of the model to different GPU on different node?
Thanks a lot!

cbalioglu · June 10, 2021, 5:09pm

Unfortunately cross-host model sharding is not supported yet, but we have plans to introduce it in a future version of PyTorch.

Knight_Zhang · June 10, 2021, 5:18pm

Thanks for your quick response, Can!