[ATen] Change tensor ordering to column-major to use standard CUDA functions

spaceinvader · May 7, 2018, 12:34pm

Hi!
I’m using ATen and cpp-extensions with PyTorch to work with sparse matrices on GPU.
But I have faced the problem: CUDA functions assume matrices to have fortran-style ordering, when PyTorch (ATen) tensors are stored in C-style.
Could you tell me how can I change PyTorch tensor ordering?

richard · May 9, 2018, 10:30pm

How exactly does changing a tensor to column-major help sparse matrices on GPU?

I think you could transpose a matrix: matrix.t().contiguous() and that would change the matrix to column-major, but I’m not sure that’s what you want.

eduardo4jesus · May 16, 2022, 3:43pm

Sorry for resurrecting this old question. But I have a similar one.

Suppose I have a tensor x in the shape B, C, H, W. What is better?

x.permute({2, 3, 0, 1}).clone()

or

x.t().contiguous()

?

ptrblck · May 16, 2022, 11:11pm

x.t() won’t work on a tensor with 4 dimensions, as <=2 dims are expected so you should remove the {} and use the first approach:

x.permute(2, 3, 0, 1).contiguous()

eduardo4jesus · May 17, 2022, 2:17am

Thank you for pointing that out.

I tried to use the code without the {} and it throw some compilation errors:

error: too many arguments in function call

That is because I am writing in C++, for which the ATen documentation expects an IntArrayRef

at::Tensor at::permute(const at::Tensor &self, at::IntArrayRef dims)

Now I have this additional question:

Is there any advantage between using clone() or contiguous()? Would they be equivalent performance-wise in this case?

Many thanks!

ptrblck · May 17, 2022, 4:16am

Ah OK, I didn’t realize this.

.clone() will just create a copy of the tensor and will not make it contiguous, while .contiguous() will make create a contiguous (in memory) copy of the tensor.