I am currently implementing my own train.py for yolov3 object detection.
However, I met some problem about the labels.
For yolov3 object detection, it is allowed to detect multiple object in an image.
In this case, different image may have different number of detected object. It means that different image may have different number of ground truth bounding boxes.
Ideally, I would like to concatenate the ground truth bounding boxes of same image into a dimension.
Assume it is a single class problem
e.g
[image_index, object_index, parameter] (parameter[tx,ty,tw,th,Po,Pc1])
For image 1, 5 objects are detected in ground truth so its shape is [1, 5, 6]
For image 2, only 2 are detected in ground truth so its shape is [1, 2, 6]
In this case, the label for image 1 and image 2 cannot concatenate into one tensor as they have different shape at (dimension = 1).
To solve this problem, I have tried to combine them using
list = []
list.append(label1)
list.append(label2)
However, the tensor must be stack together in torch.utils.data.DataLoader.
Hence, I got the following Error owing to different shape of labels:
RuntimeError: stack expects each tensor to be equal size, but got [1, 16] at entry 0 and [6, 16] at entry 1
16 = 1+4+11, 11 classes
All my labels are retrieved from a single .csv file. I am currently implementing the train.py using 7 images and 11 classes. All images are padded and resized to 416x416
Thanks for your time.