Is there a way to return batch image in the __get_item__ method of the dataloader?

Hello. I found that the bottleneck of training procedure of my project to date is the data reading from disk. For a image with size 640480, I just need size of 320240 therefore I use the random_crop. However it will help if I can crop the sam image multi times and pack them up to a batch at the same time. As the get_item method should return tensor with shape CHW and them the dataloader pack up to NCHW, is there a way to pre pack up multi image in the get_item ? Thanks!

1 Like

you can set batch_size = 1 in Dataloader, and write your dataset. You may refer to

It’s something like:

   def __getitem__(self, index):

        path, target = self.imgs[index]
        img = self.loader(path)
        # ......
        for ii in range(batch_size):
             img_ = random_crop(img)
             imgs.append(img_)
        return torch.stack(imgs), target

thanks for you reply. However I need the batch size to be large than 1… And if I write torch.stack(imgs) in the getitem , the return data shape will be NCHW which should be CHW…

then you also need to write your collate_fn

something like

def my_collate(batch):
    imgs,targets = zip(*batch)
    return torch.cat(imgs),torch.cat(targets)

and use it in dataloader

dataloader = DataLoader(dataset,collate_fn=my_collate)
2 Likes

Thanks! I got your point.