Way to avoid for loop consisting of tensor roll and norm computation to achieve faster computation

I need to roll a tensor and compute a norm for a definite number of times. This is pretty straightforward to do by using a for loop shown as shown bellow -

for i in range(k):
        y = torch.roll(y, shifts, dims)
        x = torch.norm(y - x, dim)

But for my use case, this tensor is huge in size and the entire set of operations needs to be done large number of times. Also I need to do this computation inside a loss function. So using a loop is taking quite some time in terms of computation and that is becoming a considerable bottleneck. Is there any way to avoid using the for loop to perform this task, perhaps using PyTorch’s operations? Any suggestions would be highly appreciated.

Running into a similar issue, did you ever resolve this?

Hi, apologies for the late reply! I did not exactly resolve that issue. I actually reformulated that use-case problem to avoid that.