Hi,
I was wondering upon calling a kernel and giving as arguments packed_accessor32, where is the tensor copied ? on the global memory of the GPU, or does it dispatch it on the shared memory of each SM
Hi,
I was wondering upon calling a kernel and giving as arguments packed_accessor32, where is the tensor copied ? on the global memory of the GPU, or does it dispatch it on the shared memory of each SM