How to loop through two data sets at once with iterators in Pytorch without getting a StopIteration error?

Brando_Miranda · March 6, 2018, 5:53pm

is the following the recommended way to do it in pytorch:

for epoch in range(nb_epochs):
    for train_data, test_data in zip(trainloader, testloader):
          #do train

is this the pytorch way to do it? There isn’t any inefficiencies or any subtle things I should worry about?

richard · March 6, 2018, 6:15pm

depending on how many elements large trainloader and testloader are, you could use the itertools zip: https://docs.python.org/2/library/itertools.html#itertools.izip

Brando_Miranda · March 6, 2018, 10:09pm

I want to track the train and test error at the end of each epoch.

Brando_Miranda · March 6, 2018, 10:27pm

what would be the difference of using izip vs zip?

richard · March 6, 2018, 10:42pm

If you’re using Python 3, you should use zip. zip returns an iterator.

If you’re using Python 2, you can get some memory gains by using izip over zip. This is because in Python 2 zip returns a list instead of an iterator, while izip returns an iterator. If you’re iterating over very long lists, the lists will take up memory.

Brando_Miranda · March 6, 2018, 10:43pm

but in the context of DL, zip truncates my training to the size of the test set cuz test sets are usually smaller. I am sure someone has dealt with this before me…

richard · March 6, 2018, 10:46pm

I’m not sure what you’re looking for but Itertools has a zip_longest method that might be helpful: https://docs.python.org/3.0/library/itertools.html