Indefinitely loop secondary DataLoader?

I have two DataLoaders with different numbers of examples in each. Each time a batch is taken from the primary DataLoader, I want to sample a batch from the secondary DataLoader. However, normally I would just iterate over the DataLoader to extract batches. But the secondary DataLoader being asked for samples within the primary DataLoader's for loop iteration. How do I make sure the secondary DataLoader provides a new batch of examples each time the primary DataLoader's loop is being run? Thank you for your time!

EDIT: In normal Python, itertools's cycle function might be used to create such an infinite iterator. Does this work with the DataLoader iterator? Or are there problems using this approach based on how the DataLoader is built?

1 Like

In this case, you can write a custom Dataset (http://pytorch.org/docs/master/data.html#torch.utils.data.Dataset) that for a given index, returns the indexed data from the first dataset and samples a second one. :slight_smile:

1 Like

Of course! Apparently I was thinking about this harder than I needed to. Thank you!

@mattrobin
I tried using itertools’s cycle function on a dataloader that loads ImageNet data to implement the same thing you are trying to do. The CPU memory usage keeps on increasing and the system crashes everytime.

Were you able to design a custom Dataset as suggested by @SimonW? If so, could you please share the code?