About DataLoader options

In the classification task, what`s the commonly used options ?
In both train_data and test_data, Is it ok those three options are True ?

If I have 2 dataset( MNIST, SVHN), these dataset have different data_size. then can I use those three options are True?

drop_last=True
pin_memory=True
shuffle=True

It depends, what you are trying to do with your datasets.
Let’s walk through the arguments one by one.

drop_last makes sure to drop the last batch if the number of samples in this batch would be smaller than your specified batch size. While this might be useful during training, it can be harmful for your test data, if you need predictions for all test sample, e.g. for a Kaggle competition.

pin_memory uses pinned host memory to speed up the transfer of your data to the device. Have a look at this blogpost for more information.

shuffle makes sure to randomize the order of your samples. While it’s most likely beneficial for your training data to be shuffled, it might be harmful for your test data, if you need a specific order of your predictions (e.g. Kaggle again). At best it’s useless to shuffle your validation and test data as it won’t have any beneficial effect.

1 Like