Custom dataset/dictionary of labels and images

Hi, I have a tricky problem (at least to me) and am not sure how to proceed.

Let’s say I have a dataset of images and I have generated some labels for every batch. These are stored in batches of size b_size

How this goes for b_size = 32:

  1. Traverse dataset and generate batches of size 32 so something like (32, 1, 64, 64).
  2. For every batch I have a set of labels of size (32, 100, 1, 1)

I want to create a dataset/dictionary that can store these image batches and corresponding labels so that when I later iterate over this dataset/dictionary it returns the (x,y) pair where x is the batch of images and y is the corresponding batch of labels

I would prefer a dataset since I also want to subset it in the future. But I honestly have no idea how to start this.

Any ideas would be appreciated thanks!

You could check this tutorial which explains how to write a custom Dataset. In the simplest approach you would need to implement the logic to load and process a single sample in the __getitem__ and the DataLoader would shuffle, create the batches, etc.