How to do peer-to-peer copy between GPU's?

Is this possible to do ? Here’s what I have tried so far, to no avail. I am on device 1 trying to access a variable from device 0.

    torch.cuda.current_device() #prints 1
    outputs.get_device() #prints 0

    torch.cuda.comm.broadcast(outputs, (0, 1))
    outputs.get_device() #prints 0

    outputs.cuda(1)
    outputs.get_device() #prints 0

    outputs.to(1)
    outputs.get_device() #prints 0

    outputs.cuda(device=1)
    outputs.get_device() #prints 0

If this is relevant, the outputs variable is an output from an LSTM hosted on GPU 0. I’m trying to move it to GPU 1for additional computation because I think I’m getting a from not enough memory on GPU 0.

outputs = outputs.to(1)

should work, since it’s not an inplace operation.
The same goes for the cuda() call.

PS: you can use it inplace on Modules, but I would recommend to always assign it.