How do you group a tensor into 3x3 blocks using sliding window of stride 1.

Basically how can I use torch functions to accelerate this:

```
grp = []
m, n = I.shape[-2:]
for i in range(m-3):
for j in range(n-3):
grp.append(x[..., i:(i+3), j:(j+3)])
grp = torch.stack(grp, dim=-3)
return grp
```

Basically i want to apply a filter but it’s not a convolutional one, instead it’s doing some XOR’ing. The only way i can think of doing this is to prepare my 3x3 blocks ahead of time then do some XOR’ing with a 3x3 mask/filter that’s being suitably broadcast along the first few axes.