Usage of drop_last on data_loader

oasjd7 · January 15, 2020, 1:35pm

dataset_a : 60,000 images
dataset_b: 100,000 images
If I use drop_last=True in dataloader, Can I use whole data of dataset_b ?

 for step, (data_a, data_b) in enumerate(zip(data_loader_a, data_loader_b)):
...

beaupreda · January 15, 2020, 2:38pm

Hello,

The drop_last=True parameter ignores the last batch (when the number of examples in your dataset is not divisible by your batch_size) while drop_last=False will make the last batch smaller than your batch_size (see docs). This is not related to your issue of seeing or not the whole dataset_b.

In your case, you will not use the whole dataset_b because your for loop will only iterate over the smallest dataloader in your zip function i.e. dataloader_a. In other words, you will have n iterations (step) where n = 60000 / batch_size, which means that 40000 examples of dataset_b will not be seen.

Hope it clarifies a bit!