How to group a tensor into a 3x3 blocks using sliding window of stride 1

How do you group a tensor into 3x3 blocks using sliding window of stride 1.
Basically how can I use torch functions to accelerate this:

grp = []
m, n = I.shape[-2:]
for i in range(m-3):
    for j in range(n-3):
        grp.append(x[..., i:(i+3), j:(j+3)])
grp = torch.stack(grp, dim=-3)
return grp

Basically i want to apply a filter but it’s not a convolutional one, instead it’s doing some XOR’ing. The only way i can think of doing this is to prepare my 3x3 blocks ahead of time then do some XOR’ing with a 3x3 mask/filter that’s being suitably broadcast along the first few axes.

Is Unfold — PyTorch 1.8.1 documentation what i want ?

So Unfold does exactly what i want but this is a very memory hungry way of doing things, since the data is duplicated. It’s probably worth writing a c++ extension…