Using Two Data Iterators Under One Loop

Hasan_Sait_ARSLAN1 · May 19, 2020, 4:26am

Dear Pytorch enthusiasts,

I have a question for you.

I have two datasets, and would like to use a different data iterator for each of them to train one model.

To do so, I have used DataLoader for each of them to get data iterators, and shuffle argument was set to True.

Then, I have used them as following under one loop, for each epoch:

for idx, datum in enumerate(zip(data_iterator1, data_iterator2)):

Here, the length of data_iterator1 is ~28 times longer than the length of data_iterator2.

Here I have the following questions in my mind:

When the loop ends? Does it end when the loop sees all the batches belonging to data_iterator2, or it ends until seeing all the batches belonging to data_iterator1?
If the loop ends after seeing all the batches belonging to data_iterator2, at the next epoch, are the data belonging to both data iterators shuffled again?

I would be thankful if someone here would clarify these 2 questions for me.

Thank for your time.

Best regards,

JuanFMontesinos · May 19, 2020, 6:00am

zip by default get exhausted when the 1st generator get exhausted (it’s python’s).

If I’m not wrong (I don’t remember) zip will keep exhausted untill you rebuild the object. if you do so both iterators will restart.

Hasan_Sait_ARSLAN1 · May 19, 2020, 6:02am

Much obliged, for answering Juan!

Could you explain in detail, what do you mean by “both iterators will restart”?

JuanFMontesinos · May 19, 2020, 6:31am

When you call zip you are invoking the iterator of the object.
Depending on how it has been code it will work one way or another.
Here you have a explanation.
https://anandology.com/python-practice-book/iterators.html

Objects which are supposed to be iterated several times are coded in such a way that, once they get exhausted, they will reset the iterable so that you can iterate over them without reinstanciating.

Hasan_Sait_ARSLAN1 · May 19, 2020, 6:33am

So it means, when they are restarted, they are not shuffled?