Three questions about accessor

haoran-hash · September 2, 2022, 8:43am

Why use accessor instead of torch::Tensor in cuda kernel function ?
What’s the difference between accessor and packed accessor ?
What’s the difference between packed_accessor32 and packed_accessor64 ?

I have read https://pytorch.org/tutorials/advanced/cpp_extension.html#using-accessors, but I am still not clear about these questions. I hope someone can help me. Thanks a lot.

eqy · September 2, 2022, 8:13pm

My understanding is that the accessors are generally used for convenience (and potentially safety), as they allow you to index tensors in kernels in more of a “Python” style, rather than having to deal with understanding the underlying strides for each of the axes—as the main alternative is to access tensors in the “flat” way and to do the indexing arithmetic manually. PackedAccessors according to the docs are the equivalent for CUDA kernels, and I believe 32 vs. 64 is referring to the bit width of the underlying flat index (e.g., if you have large tensors with potentially more than INT_MAX elements, the 64-bit variant should be used).

haoran-hash · September 15, 2022, 1:52pm

I think you have answered question 1 and 3 very well.
But for question 2, I know PackedAccessors is about CUDA. What I want to ask is what this sentence mean in Pytorch docs. I didn’t make it clear before, I’m sorry.

The fundamental difference with Accessor is that a Packed Accessor copies size and stride data inside of its structure instead of pointing to it. It allows us to pass it to a CUDA kernel function and use its interface inside it.

tom · September 15, 2022, 3:08pm

To do indexing into the tensors, the CUDA kernel (the function running on the GPU) needs to have the stride and size information on the GPU. The PackedAccessor has fields that contain these, so if you use them as kernel arguments, the information will be transferred. This is in contrast to the Accessor, which only has a reference to stride and size but in exchange is much more efficient to create because you don’t copy the arrays. But if you pass those as a kernel argument, it will transfer the pointer to CPU memory to the GPU, so it’ll lead to the GPU doing invalid memory accesses.

Best regards

Thomas

haoran-hash · September 16, 2022, 2:00am

Thanks for your detailed answer. I have fully understood it !