Device memory allocation in pytorch

Can you, please, explain how device memory allocation is working in pytorch? In this example https://github.com/sniklaus/pytorch-extension/blob/master/src/HadamardProduct_kernel.cu he runs only kernel, but sending memory of input1 and input2 to the GPU is absent, just like loading data of output back to host. Does pytorch use UVA(unified virtual addressing) or something like it?

The memory of input1 and input2 is already on GPU.

If you see https://github.com/sniklaus/pytorch-extension/blob/master/src/HadamardProduct_kernel.cu#L25-L26

The input is a THCudaTensor, which is a GPU tensor.

PyTorch does not use UVA addressing.

Any updates from 2017?