How to batch dataset with variable target length?

Hello all,

i am wondering what my options are:

  1. My goal is object detection. So I have an image and then a list of entries with the following form [[class1, x0, y0, width, height], [class2, x0, y0, width, height], …]. The entries denote what and where objects are located in the image.

  2. The Problem : For each image there is a variable amount of objects to detect which means the list of entries have different lengths.

  3. My Questions: Do I need to pad here? Most of the time I read about padding the input but in my case the images are all the same size. I tried to use a different collate_fn where I just zip everything but this leads to an dimension error for a linear layer I use.

This is how I construct my dataset:

        img = Image.open(self.root_dir+"/"+self.prefix+"0"+str(idx)+".png").convert('RGB')
        txt_file = open(self.root_dir+"/"+self.prefix+"0"+str(idx)+".txt",'r')

        bounding_boxes=[]
        class_labels=[]
      
        lines =txt_file.read().splitlines()
        for idx,line in enumerate(lines):
            tmp = line.split(" ")
            bounding_box = [int(tmp[1]),int(tmp[2]),int(tmp[3]),int(tmp[4])]
            class_label = [int(tmp[0])]
            class_labels.append(class_label)
            bounding_boxes.append(bounding_box)
           
        sample = {'image':img,'bounding_boxes':bounding_boxes,"class_labels":class_labels}

        if self.transform:
           sample= self.transform(sample)

        return sample["image"],sample["bounding_boxes"],sample["class_labels"]

Many thanks :slight_smile:

I’m facing a similar issue now, did you solve this?