Hi!
I’m using ATen and cpp-extensions with PyTorch to work with sparse matrices on GPU.
But I have faced the problem: CUDA functions assume matrices to have fortran-style ordering, when PyTorch (ATen) tensors are stored in C-style.
Could you tell me how can I change PyTorch tensor ordering?
How exactly does changing a tensor to column-major help sparse matrices on GPU?
I think you could transpose a matrix: matrix.t().contiguous()
and that would change the matrix to column-major, but I’m not sure that’s what you want.
Sorry for resurrecting this old question. But I have a similar one.
Suppose I have a tensor x
in the shape B, C, H, W
. What is better?
x.permute({2, 3, 0, 1}).clone()
or
x.t().contiguous()
?
x.t()
won’t work on a tensor with 4 dimensions, as <=2
dims are expected so you should remove the {}
and use the first approach:
x.permute(2, 3, 0, 1).contiguous()
Thank you for pointing that out.
I tried to use the code without the {}
and it throw some compilation errors:
error: too many arguments in function call
That is because I am writing in C++, for which the ATen documentation expects an IntArrayRef
at::Tensor at::permute(const at::Tensor &self, at::IntArrayRef dims)
Now I have this additional question:
- Is there any advantage between using
clone()
orcontiguous()
? Would they be equivalent performance-wise in this case?
Many thanks!
Ah OK, I didn’t realize this.
.clone()
will just create a copy of the tensor and will not make it contiguous, while .contiguous()
will make create a contiguous (in memory) copy of the tensor.