I have a tensor T with shape [B, C, H, W] and I would like to use a “sliding window” to slice this tensor into S sub-tensors with shape [h, w]. So I should have an output tensor of [S, B, C, h, w]. Similar to a 2d convolution without actually multiplying the weights.

Here is an intuitive example (ignoring the batch and channels)

```
T = [[ a, a, b, b],
[ a, a, b, b],
[ c, c, d, d],
[ c, c, d, d]]
kernel = (2,2)
stride = (2,2)
T_s = [[[a,a],[a,a]],[[b,b],[b,b]], [[c,c],[c,c]], [[d,d],[d,d]]
```

My solution right now is using a nested loop and slice each block using indexer `T[i:i+h, j:j+w]`

, but this has proven to be quite inefficient. I’ve been digging through the docs and can’t seem to find an efficient way