Packed_accessors in pytorch c++

Ipsum · April 22, 2021, 3:26am

Hello, I am wondering if there is a way to used packed_accessors to quickly index tensors on the cuda without having to resort to frequent, hand-written cuda kernel calls. For instance, if I have a 1-D tensor with 100 elements, and I need to get to the 50th element, using a packed_accessor in a kernel call is much faster than writing tensor[50] in my regular c++ function (along with the appropriate type casting). I am sorry if this is an overly simplistic question, but I am new to c++ and am hoping to avoid writing many simple cuda kernels to simply access values in a time-sensitive manner. Thank you for any help!