I’ve an image dataset given as bunch of json files. Each key is a class label and the corresponding value is a subset of images that have this label, represented as a list of integers.
How should I pre-process and save this data such that I can use DataLoader to fetch batches of it during training?
Based on your description it seems you have multiple json files, where each file might contain the same class indices as keys with the corresponding image samples.
If that’s the case, indexing into these file would work using the class label, which would have the drawback of a possibly complicated calculation of the sample index.
You could iterate your json files once and store each image in a subfolder corresponding to the class index. Using this approach you could simply get all the image paths and index them directly to lazily load the data inside your