Packed_accessor32 stores copy in shared memory or global memory of GPU

Hi,

I was wondering upon calling a kernel and giving as arguments packed_accessor32, where is the tensor copied ? on the global memory of the GPU, or does it dispatch it on the shared memory of each SM