I note I can use torch.roll to do circular shift for a tensor, but this function seems to generate a copy of the original tensor, which can be expensive if the original tensor is large. Is there a way to do circular shift without generating a copy of the original tensor (the underlying mechanism may be something like moving a head pointer in a circular array)?

I do not believe that pytorch offers a view (or an inplace) version of roll()
(although hypothetically it could).

This won’t work with the way pytorch stores tensors. Pytorch stores
tensors (ignoring strides) in row-major form, and to preserve this storage
format, roll() has to reorder the tensor’s elements in memory. We can
use .storage() to probe how a tensor is actually stored:

One could imagine implementing a “head pointer” as a more complicated
data structure. Note, such a head pointer couldn’t simply index into a
circular array. (Consider what would happen if you rolled, say, a 5d tensor
along its third dimension.) But implementing such a scheme would require
having many of pytorch’s tensor operations, e.g., matmul(), cumsum(),
tensor indexing, and so on, understand this head-pointer data structure at
some cost in efficiency.