CUDA extension for non-contiguous Tensor

Hi,

First of all, calling .contiguous() on the input will make sure you have a contiguous tensor and won’t be noticeable for most workload. I would recommand this solution as it is much simpler and may actually be faster than the non-contiguous counterpart.

To support non-contiguous tensor, you would need to access each element by taking into account the stride of each dimension properly. So val[ind0, ind1] = data_ptr + storage_offset + ind0*stride0 + ind1*stride1. The thing is that this can make contiguous reads in cuda non contiguous anymore and destroy your kernel’s performances.

1 Like