For development I use local machine with no GPU and have a remote machine with GPU.
I like to debug my code via IDE tools but also want to have access to gpu.
Using something a-la vscode over ssh is kinda slow so I want to run my scripts locally but send some computations to remote machine.
# pytorch will connect to remote machine and start a process for GPU computation there rpc_init("server_addr") # all computations with model.parameters() will automagically execute on remote machine model = Linear(3, 1).to("remote-gpu") data = [ (Tensor([1, 2, 3]), 1), # may call .to("remote-gpu") as well (Tensor([4, 5, 6]), 2), # not too bad ] # data will be automagically sent to remote machine inside model.__call__() # or it is already there if used Tensor.to("remote-gpu") for (sample, label) in : result = model(sample) loss = compute_loss(label, result) # this is done on remote machine as well optimizer.step()
So I will run
python script.py on local machine and use my local debugging tools and all the code will be run locally except somewhere deep tensor operations will do rpc calls to remote gpu to calculate and then execution will be on my machine again.
Is there an easy API in torch.distributed.rpc to achieve this? If not easy, how can achieve this with current API?