Training on multiple datasets without mixing them

I want to train on, say, 2 datasets ds1, ds2. However, I want the samples in each batch come from only one dataset. Hence, concatenating datasets and using a DataLoader with default batch sampling and shuffle won’t work. What is a good and efficient method to do that?
I can think of two possible solutions:
1- defining a custom batch sampler, where I can sample the dataset index, and for each dataset, I use its designated batch sampler to generate the training batch. This way the DataLoader can be used with other libraries such as pytorch lightning or huggingface transformers.
2- each dataset has its own DataLoader, and write a custom training loop where at each iteration I choose the dataloader and then get a batch from it.

Any suggestions?

Both approaches sound fine. Depending on your overall use case might might want to prefer the first approach if you want to make sure higher-level APIs can consume a single DataLoader.