How to read memory like pytorch? Even after transpose?

Hi! I find out pytorch can read memory differently than cuda, like:
a = torch.tensor([1, 2, 3])
b = a.T

If we use this b, it is really transposed. But if we read it in another self defined cuda kernel, it will still read it like a. in a’s shape!

How can I package my kernel like pytorch? Sounds like, like aten? Any suggestions? Thank you!!!

I’m not sure what the code snippet should do since a is a 1-dim tensor with a size of 3 and a.T won’t change anything. Reading the memory of a and b should thus yield the same values.

Could you also explain this sentence a bit more? PyTorch uses C++ and CUDA in its backend, so I’m not sure how else PyTorch “can read memory”.

Oh, yes. Thank you for your reply! My question is more precisely decribed here: How does nn.linear work in cpp for multi dimension input? (torch._C._nn.linear)
If you could answer my further question, it will be highly appreciated!!!

I think what you mean is when you do a.t() and then read that memory in your CUDA kernel, a is not actually transposed? I believe this is because a.t() does not actually tranpose the data in memory but it just returns a view of the data. But in your CUDA kernel, when your read b, you are still reading a not the transposed version of a.

I had the same thing when I was trying to implement a fully connected layer in CUDA. I provided a link to my implementation as a reply to your other thread here. What I did was to flatten the array before I called the CUDA kernel so that b would be contiguous and correctly transposed in the CUDA kernel.