Anchor box size meaning in AnchorGenerator

millivolt9 · September 23, 2021, 11:14pm

This is a question about how anchor boxes are implemented in pytorch, as I am new to it. I have read this code, along with a lot of other stuff in the torch repo:

github.com

pytorch/vision/blob/main/torchvision/models/detection/anchor_utils.py

import math
import torch
from torch import nn, Tensor

from typing import List, Optional
from .image_list import ImageList


class AnchorGenerator(nn.Module):
    """
    Module that generates anchors for a set of feature maps and
    image sizes.

    The module support computing anchors at multiple sizes and aspect ratios
    per feature map. This module assumes aspect ratio = height / width for
    each anchor.

    sizes and aspect_ratios should have the same number of elements, and it should
    correspond to the number of feature maps.

This file has been truncated. show original

Is the “sizes” argument to AnchorGenerator with respect to the original image size, or with respect to the feature map being output from the backbone?

To be more clear and simplify, let’s say I’m only ever interested in detecting objects that are 32x32 pixels in my input images. So my anchor box aspect ratio will definitely be 1.0 as height=width. But, is the size that I put into AnchorGenerator 32? Or do I need to do some math using the backbone (e.g. I have 2 2x2 max pooling layers with stride 2, so the size that I give AnchorGenerator should be 32/(2^2) = 8)?

millivolt9 · September 27, 2021, 3:09pm

@fmassa if you’ve got the time, I’d appreciate a comment on this

Krishna_Bandi · June 8, 2022, 10:48pm

Hi @millivolt9 ,

I faced the same question and did a lot of reading around to understand it finally. The anchor box sizes are with respect to the original image size and not the feature maps. For every pixel in the feature map, the corresponding centered pixel in the input image will have multiple anchor boxes around it.

sizes array and aspect_ratios arrays need to be of same length (i.e number of feature maps - if you use FPN, then by default it is 5 Feature Maps, if you don’t use FPN, then you have only 1 Feature Map)

But within one feature map, i.e the sizes[i] and aspect_ratios[i] can be anything and different from another feature map and total number of anchor boxes per feature map = len(sizes[I]) * len(aspect_ratios[I]).

For a more detailed explanation, refer to the Class description of AnchorGenerator - vision/anchor_utils.py at a7e4fbdc925a5968988ccadd6dffe7abe274dcdc · pytorch/vision · GitHub

Hope this helps.