What is the best way to create a dataloader for multiple datasets such that each batch is from only one dataset, and return the name of the dataset?

flaviomaia77 · September 25, 2021, 11:25pm

Lets say I have datasets A and B. I want to receive at runtime batches that are either from A or from B, (not mixed batches) and also their origin. So some tuple like (‘A’, BatchFromA).
How to implement that?

ptrblck · September 26, 2021, 10:12pm

To get the dataset name in each sample, you could return it in the __getitem__ of each Dataset.
A custom sampler might be the best approach to create batches from unique datasets and to make sure they are not mixed.