Hello all,
i am wondering what my options are:
-
My goal is object detection. So I have an image and then a list of entries with the following form [[class1, x0, y0, width, height], [class2, x0, y0, width, height], …]. The entries denote what and where objects are located in the image.
-
The Problem : For each image there is a variable amount of objects to detect which means the list of entries have different lengths.
-
My Questions: Do I need to pad here? Most of the time I read about padding the input but in my case the images are all the same size. I tried to use a different collate_fn where I just zip everything but this leads to an dimension error for a linear layer I use.
This is how I construct my dataset:
img = Image.open(self.root_dir+"/"+self.prefix+"0"+str(idx)+".png").convert('RGB')
txt_file = open(self.root_dir+"/"+self.prefix+"0"+str(idx)+".txt",'r')
bounding_boxes=[]
class_labels=[]
lines =txt_file.read().splitlines()
for idx,line in enumerate(lines):
tmp = line.split(" ")
bounding_box = [int(tmp[1]),int(tmp[2]),int(tmp[3]),int(tmp[4])]
class_label = [int(tmp[0])]
class_labels.append(class_label)
bounding_boxes.append(bounding_box)
sample = {'image':img,'bounding_boxes':bounding_boxes,"class_labels":class_labels}
if self.transform:
sample= self.transform(sample)
return sample["image"],sample["bounding_boxes"],sample["class_labels"]
Many thanks