How nn.Unfold works?

>>>unfold = nn.Unfold(kernel_size=(2, 3))
>>> input = torch.randn(2, 5, 3, 4)
>>> output = unfold(input)

Can someone explain the exact operation of using F.unfold please. Like I read doc but its not clear I need to understand the whole maths behind.

Thanks :slight_smile:

Unfold uses a sliding window approach as seen in this code snippet:

unfold = nn.Unfold(kernel_size=(2, 3))
input = torch.randn(2, 5, 3, 4)
output = unfold(input)

output_manual = []
kernel_size = [2, 3]
# sliding window approach
for i in torch.arange(input.size(2)-kernel_size[0]+1):
    for j in torch.arange(input.size(3)-kernel_size[1]+1):
        # index current patch
        tmp = input[:, :, i:i+kernel_size[0], j:j+kernel_size[1]]
        # flatten and keep batch dim
        tmp = tmp.contiguous().view(tmp.size(0), -1) # has a shape of [2, 30] afterwards
        output_manual.append(tmp)
    
# stack outputs in dim2
output_manual = torch.stack(output_manual, dim=2)

# compare
print((output_manual == output).all())
# > tensor(True)

The manual approach doesnโ€™t consider padding, dilation etc. and just uses the nested for loops to create the current patch/window via indexing.

Okay @ptrblck I think its lil clear now, actually sir in my case I am applying this unfold operation on a tensor of A as given below:

A.shape=torch.Size([16, 309,128])
A = A.unsqueeze(1) # that's I guess for making it 4 dim for unfold operation
A_out= F.unfold(A, (7, 128), stride=(1,128),dilation=(3,1))
A_out.shape=torch.Size([16, 896,291])

I am not getting this 291 :frowning: If dilation factor is not there then as per your nested loop theory it would be [16,896,303] right ? Thanks for reply