Distributed Model Parallel Using Distributed RPC

My two cents on the above discussion too: ideally you shouldn’t specialize your code to handle differently the transfers between GPUs on the same node and between different nodes. By doing so you couple your code with your deployment, meaning you need to rewrite some parts to change from 4 GPUs/host to single-GPU hosts and so on. With the TensorPipe agent you will be able to perform RPC calls between GPUs on a node and the data will still be transferred over NVLink just as if you had done t.to(…). So with no performance overhead you get code that is resilient to a topology change.

2 Likes