Split CNN feature map to patches

So Im working on a computer vision model, and in a part of my code I’ve got 2 images, A and B:
I want to run A and B through a cnn (lets say vgg16), get the feature map of both images from specific layer (Until now it’s simple to do).

Now that Iv’e got ft_A and ft_B (the feature maps of both images), I want to split each feature map to patches/blocks. For example:

If ft_A is 128x96x96 (CxWxH), I want to split it to patches of size 16x16, So I will have 6*6 patches of size 128x16x16, in a tensor of size 36x128x16x16 (num_PATCHESxCxW_PATCHxH_PATCH).

Now I will do some operation on each patch, for example take each patche’s gram matrix, which will turn each patch to 36x128x128.

Finally my loss function will be MSE between gram_matrix_A of size 36x128x128 and gram_matrix_B of size 36x128x128 .

My question
How can I do that? Is there a fast wat to do these operations? Specifically, in tensorflow there is the function space_to_batch_nd that turns the feature map to patches, is there such thing in pytorch, or a good way to implement it?


you could try with torch.nn.functional.unfold method.

import torch, torch.nn as nn, torch.nn.functional as F
x = torch.randn(1, 128, 96, 96).cuda() 
o = F.unfold(x, kernel_size=16, stride=16) # 1, 32768, 36 
o = o.view(1, 128, 16, 16, 36)  # 1, 128, 16, 16, 36
o = o.permute(0, 4, 1, 2, 3)  # 1, 36, 128, 16, 16
1 Like

@InnovArul I think it’s what I need, thank you!
I will update if it works or if I face problems!