I’m working on datasets for YOLOv1.
I have an Image with two ground truth boxes.
box1 = [x1, y1, w1, h1, CS1]
box2 = [x2, y2, w2, h2, CS2]
(CS is confidence score. I deleted Class parameter because I don’t need classification. I only need localization.)
below is example for my data
the model output is (1, 7, 7, 10)
how can I make targets for this?
if I want 2 bounding box predictions for each grid cell like yolo paper,
does all 7*7 grid cell should go to (x1, y1, w1, h1, CS1, x2, y2, w2, h2, CS2)?
or just certain 2 grid cell should go like (x1, y1, w1, h1, CS1, x1, y1, w1, h1, CS1)
and (x2, y2, w2, h2, CS2, x2, y2, w2, h2, CS2) ?