Using Two Data Iterators Under One Loop

Dear Pytorch enthusiasts,

I have a question for you.

I have two datasets, and would like to use a different data iterator for each of them to train one model.

To do so, I have used DataLoader for each of them to get data iterators, and shuffle argument was set to True.

Then, I have used them as following under one loop, for each epoch:

for idx, datum in enumerate(zip(data_iterator1, data_iterator2)):

Here, the length of data_iterator1 is ~28 times longer than the length of data_iterator2.

Here I have the following questions in my mind:

  1. When the loop ends? Does it end when the loop sees all the batches belonging to data_iterator2, or it ends until seeing all the batches belonging to data_iterator1?
  2. If the loop ends after seeing all the batches belonging to data_iterator2, at the next epoch, are the data belonging to both data iterators shuffled again?

I would be thankful if someone here would clarify these 2 questions for me.

Thank for your time.

Best regards,

zip by default get exhausted when the 1st generator get exhausted (it’s python’s).

If I’m not wrong (I don’t remember) zip will keep exhausted untill you rebuild the object. if you do so both iterators will restart.

Much obliged, for answering Juan!

Could you explain in detail, what do you mean by “both iterators will restart”?

When you call zip you are invoking the iterator of the object.
Depending on how it has been code it will work one way or another.
Here you have a explanation.
https://anandology.com/python-practice-book/iterators.html

Objects which are supposed to be iterated several times are coded in such a way that, once they get exhausted, they will reset the iterable so that you can iterate over them without reinstanciating.

So it means, when they are restarted, they are not shuffled?