Gradients of tensor received from another GPU

marioo · September 9, 2021, 11:16am

Hi,

I was planning to process some data on a GPU (let me call it GPU1) and then send a tensor from this GPU1 to another GPU (GPU2) using torch.distributed.send and torch.distributed.recv.

I was wondering, does the received tensor on GPU2 keep the gradients history from its previous life on GPU1? Is it possible to apply backpropagation ?

Thanks in advance for your help

ptrblck · September 9, 2021, 11:05pm

Your use case sounds like model parallel, so I’m unsure if you would really need to use send/recv or could use this simple example.

marioo · September 10, 2021, 6:52am

Thanks for your reply. Sorry for the incomplete explanation of the the problem. What I want is not similar to that you example. I want to code a parallel solver. So, each gpu is solving part of the problem in parallel and after some operations some of them have to share their tensors in pairs. I was wondering if the gradients of those tensor being transferred are lost and if they can be backpropagated.

pritamdamania87 · September 14, 2021, 1:37am

There is no backpropagation for send and recv. You can use the RPC Framework: Distributed RPC Framework — PyTorch 1.9.0 documentation and that will allow you to backpropagate across RPC calls.

Alternatively, you could do this yourself via autograd functions, ex: pytorch/functional.py at master · pytorch/pytorch · GitHub. You can find docs for autograd functions here: Automatic differentiation package - torch.autograd — PyTorch 1.9.0 documentation