YOLO v1 how to encode labels

Yolkandwhite · September 28, 2020, 7:16am

Hi. I’m trying to make a model, based on YOLO v1.

but I’m having struggle with the dataset.

I already made the YOLO network and the out put is 7x7x10

7x7 is grid size and 10 is for 2 * (x, y, w, h, confidence score) two bounding box parameters.

I removed the class parameter, because my work doesn’t need classification. It only needs the bounding box for localization.

my current dataset looks like this (image, target)

target = (x1, y1, w1, h1, Confidence1, x2, y2, w2, h2, Confidence2)

but I need the targets to be 7x7x10.

how can I encode the targets to fit in the right grid cell?

ruka · September 29, 2020, 10:56am

The formula would be

numpy.floor([ y/image_size * grid_size, x/image_size * grid_size ])

Assume your input image is 256 * 256, and the center of a target is (y=10, x=123), your grid size is 7 * 7.
Then the center would fall in numpy.floor([ 10/256 * 7, 123/256 * 7 ]) = [0, 3], aka grid[0][3].