Hi! I find out pytorch can read memory differently than cuda, like:
a = torch.tensor([1, 2, 3])
b = a.T
If we use this b, it is really transposed. But if we read it in another self defined cuda kernel, it will still read it like a. in a’s shape!
How can I package my kernel like pytorch? Sounds like, like aten? Any suggestions? Thank you!!!
I’m not sure what the code snippet should do since
a is a 1-dim tensor with a size of 3 and
a.T won’t change anything. Reading the memory of
b should thus yield the same values.
Could you also explain this sentence a bit more? PyTorch uses C++ and CUDA in its backend, so I’m not sure how else PyTorch “can read memory”.
Oh, yes. Thank you for your reply! My question is more precisely decribed here: How does nn.linear work in cpp for multi dimension input? (torch._C._nn.linear)
If you could answer my further question, it will be highly appreciated!!!
I think what you mean is when you do
a.t() and then read that memory in your CUDA kernel, a is not actually transposed? I believe this is because
a.t() does not actually tranpose the data in memory but it just returns a view of the data. But in your CUDA kernel, when your read
b, you are still reading
a not the transposed version of
I had the same thing when I was trying to implement a fully connected layer in CUDA. I provided a link to my implementation as a reply to your other thread here. What I did was to flatten the array before I called the CUDA kernel so that
b would be contiguous and correctly transposed in the CUDA kernel.