CUDA extension for non-contiguous Tensor

Hi All,

I am writing a CUDA kernel for my project. I have been following the pytorch CUDA extension tutorials. However, as I understand, such an approach only supports operations contiguous tensors. How can I improve my extension to support non-contiguous tensors?

Pointers to the codes of PyTorch’s own supports for non-contiguous tensor would also be very helpful.

Thank you in advance!

Hi,

First of all, calling .contiguous() on the input will make sure you have a contiguous tensor and won’t be noticeable for most workload. I would recommand this solution as it is much simpler and may actually be faster than the non-contiguous counterpart.

To support non-contiguous tensor, you would need to access each element by taking into account the stride of each dimension properly. So val[ind0, ind1] = data_ptr + storage_offset + ind0*stride0 + ind1*stride1. The thing is that this can make contiguous reads in cuda non contiguous anymore and destroy your kernel’s performances.

1 Like