I want to use a scale pyramid to extract resize invariant features for an image similarity project. I’m using resnet18 trained with triplet loss. I want to extract features at multiple scales and average pool them.
But when I pass the image tensor through kornia.geometry.transform ScalePyramid, it is returning images with shape B,C,NL,H,W.
I’m not sure how to understand the NL dimension. I need tensors in the shape B,C,H,W. How can I drop this NL dimension?
I also want to train a network that is invariant to scale. Can I use this scale pyramid during training? If yes, how to accommodate the NL dimension.
NL is number of scale levels of the scale space, see SIFT paper, Figure 1 on the left. This mean to consequently blur the image more and more, which for linear filters is equivalent to resizing.
If you just need a pyramid, which contains HW, H/2 W/2, H/4 W/4 and so on - use kornia.geometry.transform — Kornia documentation function instead. It does not have levels unlike scale-space pyramid, that is a mistake in documentation (fixed in master)