Access result of neural net on GPU (using CUDA) without transfer to host

I have a network which outputs a large tensor, I would like to use CUDA to do some post-processing on the result of this network without round-tripping the data to host memory. Is this possible with PyTorch? Any pointers would be appreciated.

Hi,

If your network outputs a cuda tensor, then you can just use it as is.
If you don’t send the tensor to the cpu explicitly or print or assign it’s value to a cpu tensor, it will stay on the gpu.

Thanks for the quick reply @albanD. Does this mean that if I have a cuda tensor I can just pass it as an argument to a pyCuda kernel and it would just behave as a PyCuda DeviceAllocation object? Or do I have to do something else?

e.g

a = torch.LongTensor(10).fill_(3).cuda
myCudaFunc(a,
block=BLOCK_DIMS,
grid=GRID_DIMS)

Looks like this is indeed possible: https://gist.github.com/szagoruyko/440c561f7fce5f1b20e6154d801e6033

Really interesting, this seems to be easier than the equivalent with tensorflow.

Yes exactly if you’re using pycuda, this is the way to go.
If you use other custom stuff, you can access the cuda pointer pointing at your data with t.data_ptr() (you can see how it’s used in the gist you linked).

1 Like