Torchvision.models.fasterrcnn boxes values

Xiaoyu_Song · January 18, 2021, 3:20am

Hi, the doc from fasterRCNN said:

boxes (FloatTensor[N, 4]): the ground-truth boxes in [x1, y1, x2, y2] format,
with values of x between 0 and W and values of y between 0 and H

In this case, the x1, y1, x2, y2 correspond to the left low corner and right top corner cordinates?
But can not fulfil the 0-W and 0-H range.

My question is, what is x1, y1, x2, y2 positions?
Thank you

user_123454321 · January 18, 2021, 3:29am

(x1, y1) is the top-left corner and (x2,y2) is the bottom-right corner. I am not sure why they would not fulfill the 0-W and 0-H range. They are the width and height of the image right? Surely the points cannot go outside the image ?

Xiaoyu_Song · January 18, 2021, 3:41am

emmm like a rectangle

x1: 100 y1: 200
x2: 300 y2: 400

so the w: 200, h: 200

and x2, y2 is already out of 0-200 range.

user_123454321 · January 18, 2021, 3:43am

You image shape is (200, 200) ? I think you are confusing image height and width with the rectangle height and width ?

Xiaoyu_Song · January 18, 2021, 5:23am

that’s an example… another example:
x1: 100, y1: 100
x2: 400, y2: 300

h: 200, w: 300
x2 and y2 also exceed 0-200

user_123454321 · January 18, 2021, 5:31am

The H and W are height and width of the image NOT the rectangle. How can the corners be outside the image ?
Please read the doc of fasterrcnn

Implements Faster R-CNN.
    The input to the model is expected to be a list of tensors, each of shape [C, H, W], one for each
    image, and should be in 0-1 range. Different images can have different sizes.
    The behavior of the model changes depending if it is in training or evaluation mode.
    During training, the model expects both the input tensors, as well as a targets (list of dictionary),
    containing:
        - boxes (FloatTensor[N, 4]): the ground-truth boxes in [x1, y1, x2, y2] format, with values of x
          between 0 and W and values of y between 0 and H
        - labels (Int64Tensor[N]): the class label for each ground-truth box

In this example

x1: 100, y1: 100
x2: 400, y2: 300

h: 200, w: 300

surely height of image is greater than 300 and width greater than 400, otherwise how will the corners be outside.

Xiaoyu_Song · January 18, 2021, 5:39am

yes. i’m saying that. in the doc. they said x and y are between 0-w and 0-h.
But the value in my example of x and y already bigger than 200 (Height)

SO the definition in doc is wrong? Or the x1, y1, x2, y2 values are defiend wrongly in my example?

Xiaoyu_Song · January 18, 2021, 5:53am

The H, W is the image height and width, not the h w of the box. Thank you @user_123454321