Efficiently slicing tensor like a convolution?

I have a tensor T with shape [B, C, H, W] and I would like to use a “sliding window” to slice this tensor into S sub-tensors with shape [h, w]. So I should have an output tensor of [S, B, C, h, w]. Similar to a 2d convolution without actually multiplying the weights.

Here is an intuitive example (ignoring the batch and channels)

T = [[ a, a, b, b],
     [ a, a, b, b],
     [ c, c, d, d],
     [ c, c, d, d]] 

kernel = (2,2)
stride = (2,2)

T_s = [[[a,a],[a,a]],[[b,b],[b,b]], [[c,c],[c,c]], [[d,d],[d,d]]

My solution right now is using a nested loop and slice each block using indexer T[i:i+h, j:j+w] , but this has proven to be quite inefficient. I’ve been digging through the docs and can’t seem to find an efficient way

tensor.unfold should yield the desired output:

B, C, H, W = 2, 3, 4, 4
x = torch.arange(B*C*H*W).view(B, C, H, W)

kernel_h, kernel_w = 2, 2
stride = 2

patches = x.unfold(2, kernel_h, stride).unfold(3, kernel_w, stride)
print(patches )
2 Likes

Just to save others from making my mistake: I was confused at first because there is a function, a method, and a class all called “unfold”, and Google finds #1 first. However, @ptrblck used #3 in the answer above, I think (#1 has different arguments).

  1. torch.nn.functional.unfold (Python function, in torch.nn.functional)
  2. torch.nn.Unfold (Python class, in Unfold)
  3. torch.Tensor.unfold (Python method, in torch.Tensor)
1 Like