Accessing the Single-Segment Buffer Interface

seba-1511 · March 25, 2017, 8:05pm

Hi everyone,

I was wondering if there was a way to access the single-segment buffer interface of CUDA Tensors. For example, when using pyCUDA, I used to do:

def buff(ary):
    return ary.gpudata.as_buffer(ary.nbytes)

I’d like to use mpi4py with Pytorch, and while I can use the numpy interface for CPU tensors, this is not possible for GPU ones. This would be a nice trick while we wait for the distributed interface to be polished.

Thanks a lot for your help (and for developing Pytorch),
Séb

smth · March 26, 2017, 7:00pm

you cannot use the buffer interface for CUDA tensors, but you can get the GPU pointer as an int:

x = torch.randn(10).cuda()
print(x.data_ptr())

seba-1511 · March 27, 2017, 2:03am

Thanks for the fast answer.

Does that mean that I’d have to write my own C buffer wrapper ? (Maybe PyBuffer_FromObject: https://docs.python.org/2/c-api/buffer.html#c.PyBuffer_FromObject)

I saw that in distributed you use THDPTensorDesc but I guess I can’t access it from Python for now. Do you have a suggestion as to what might be the best alternative ?

smth · March 27, 2017, 2:39am

do you really need a PyBuffer? We attempted implementing the Buffer interface, but it is slightly different across many versions of python and impossible to implement without thousands of lines of code.

Cant you work with the data pointer and size of tensor ?

seba-1511 · March 27, 2017, 2:56am

The advantage of the Buffer interface is that mpi4py can take advantage of it. (Essentially making the whole of MPI available) I don’t know of a way to use it directly with address and length. Maybe using custom DataTypes, but I’d need to investigate that.

Since I can afford a hacky solution (only need send/recv), I’ll try that while waiting for THDP. In any case, I’ll keep this thread up-to-date.

stsievert · September 16, 2017, 10:40pm

Any updates on this? I’m interested in using mpi4py with Tensors. Does Tensor use the PyBuffer interface?

seba-1511 · September 19, 2017, 4:27am

Not that I am aware of. I (quickly) tried to implement a buffer interface at that time, but was not successful.

Nowadays, I would strongly recommend torch.distributed. It’s great and has support for MPI if you really need it.

stsievert · September 19, 2017, 4:39pm

Does PyTorch work with MPI like other programs would? i.e., would the below work

$ mpiexec -n 4 python torch_script.py

when torch_script.py contains

torch.distributed.init_process_group(backend='mpi', world_size=4)

smth · September 19, 2017, 6:06pm

yes. that’s how we run our tests: https://github.com/pytorch/pytorch/blob/master/test/run_test.sh#L98-L108

Stone · July 18, 2018, 7:39am

To use MPI you have to build from source, I think it’s kind of annoying.

seliad · April 18, 2020, 6:27am

So, is there a workaround?

torch.distributed MPI operations are very limited (+requires building from source), while mpi4py supports more operations and seems like excellent library.