So Im working on a computer vision model, and in a part of my code I’ve got 2 images, A and B:
I want to run A and B through a cnn (lets say vgg16), get the feature map of both images from specific layer (Until now it’s simple to do).
Now that Iv’e got ft_A and ft_B (the feature maps of both images), I want to split each feature map to patches/blocks. For example:
If ft_A is 128x96x96 (CxWxH), I want to split it to patches of size 16x16, So I will have 6*6 patches of size 128x16x16, in a tensor of size 36x128x16x16 (num_PATCHESxCxW_PATCHxH_PATCH).
Now I will do some operation on each patch, for example take each patche’s gram matrix, which will turn each patch to 36x128x128.
Finally my loss function will be MSE between
gram_matrix_A of size 36x128x128 and
gram_matrix_B of size 36x128x128 .
How can I do that? Is there a fast wat to do these operations? Specifically, in tensorflow there is the function space_to_batch_nd that turns the feature map to patches, is there such thing in pytorch, or a good way to implement it?