Dataset class train/test split without second call to Dataset object?

kirk86 · April 15, 2019, 5:01pm

Hi folks,
the Dataset() object when first called with the train=True, (i.e.)
dset = torchvision.datasets.CIFAR10(...)
option contains the following params:

We can see that the test_list is also pointing to the test set in our dataset, would it be valuable to have the ability to load the test set with dset.train=False instead of having to call torchvision.datasets.CIFAR10(..., train=False) again, or is it completely wrong what I’m proposing?

ptrblck · April 19, 2019, 10:19pm

Just setting the attribute won’t change anything, since self.data and self.target was already loaded as seen in these lines of code.
You would therefore have to implement a new method which reloads the test/train data, which would basically be another call to __init__.

In my opinion, you should definitely create separate train and test datasets, as even using separate datasets you often encounter code with data leakage.
Also the training and test transformation often differ, which you would also have to pass again to the custom method.

kirk86 · August 8, 2019, 10:30am

Thanks for the clarifications!