Access result of neural net on GPU (using CUDA) without transfer to host

vellamike · April 10, 2018, 3:42pm

I have a network which outputs a large tensor, I would like to use CUDA to do some post-processing on the result of this network without round-tripping the data to host memory. Is this possible with PyTorch? Any pointers would be appreciated.

albanD · April 10, 2018, 3:54pm

Hi,

If your network outputs a cuda tensor, then you can just use it as is.
If you don’t send the tensor to the cpu explicitly or print or assign it’s value to a cpu tensor, it will stay on the gpu.

vellamike · April 10, 2018, 4:18pm

Thanks for the quick reply @albanD. Does this mean that if I have a cuda tensor I can just pass it as an argument to a pyCuda kernel and it would just behave as a PyCuda DeviceAllocation object? Or do I have to do something else?

e.g

a = torch.LongTensor(10).fill_(3).cuda
myCudaFunc(a,
block=BLOCK_DIMS,
grid=GRID_DIMS)

vellamike · April 10, 2018, 4:20pm

Looks like this is indeed possible: https://gist.github.com/szagoruyko/440c561f7fce5f1b20e6154d801e6033

Really interesting, this seems to be easier than the equivalent with tensorflow.

albanD · April 10, 2018, 4:23pm

Yes exactly if you’re using pycuda, this is the way to go.
If you use other custom stuff, you can access the cuda pointer pointing at your data with t.data_ptr() (you can see how it’s used in the gist you linked).