Accessing the Single-Segment Buffer Interface

Hi everyone,

I was wondering if there was a way to access the single-segment buffer interface of CUDA Tensors. For example, when using pyCUDA, I used to do:

def buff(ary):
    return ary.gpudata.as_buffer(ary.nbytes)

I’d like to use mpi4py with Pytorch, and while I can use the numpy interface for CPU tensors, this is not possible for GPU ones. This would be a nice trick while we wait for the distributed interface to be polished.

Thanks a lot for your help (and for developing Pytorch),

you cannot use the buffer interface for CUDA tensors, but you can get the GPU pointer as an int:

x = torch.randn(10).cuda()
1 Like

Thanks for the fast answer.

Does that mean that I’d have to write my own C buffer wrapper ? (Maybe PyBuffer_FromObject:

I saw that in distributed you use THDPTensorDesc but I guess I can’t access it from Python for now. Do you have a suggestion as to what might be the best alternative ?

do you really need a PyBuffer? We attempted implementing the Buffer interface, but it is slightly different across many versions of python and impossible to implement without thousands of lines of code.

Cant you work with the data pointer and size of tensor ?

The advantage of the Buffer interface is that mpi4py can take advantage of it. (Essentially making the whole of MPI available) I don’t know of a way to use it directly with address and length. Maybe using custom DataTypes, but I’d need to investigate that.

Since I can afford a hacky solution (only need send/recv), I’ll try that while waiting for THDP. In any case, I’ll keep this thread up-to-date.

Any updates on this? I’m interested in using mpi4py with Tensors. Does Tensor use the PyBuffer interface?

Not that I am aware of. I (quickly) tried to implement a buffer interface at that time, but was not successful.

Nowadays, I would strongly recommend torch.distributed. It’s great and has support for MPI if you really need it.

Does PyTorch work with MPI like other programs would? i.e., would the below work

$ mpiexec -n 4 python

when contains

torch.distributed.init_process_group(backend='mpi', world_size=4)

yes. that’s how we run our tests:

To use MPI you have to build from source, I think it’s kind of annoying.

So, is there a workaround?

torch.distributed MPI operations are very limited (+requires building from source), while mpi4py supports more operations and seems like excellent library.