How to understand this MNIST dataloading code?

SouthTorch · June 19, 2022, 11:31pm

I successfully used the following code to download the training and testing data of MNIST dataset, but I don’t understand how two things work:

1: train_data and test_data all got data from datasets.MNIST, but train_data got 60K data, test_data got another 10K data, which syntax differentiate the downloading of different data? Train=True/False is the button to switch that?

2: why test_data get 10K new data without saying download=True.

Here is the code:

train_data=torchvision.datasets.MNIST('../data', train=True, download=True,
                   transform=torchvision.transforms.Compose([
                       torchvision.transforms.ToTensor(),
                       torchvision.transforms.Normalize((0.1307,), (0.3081,))
                   ]))

test_data=torchvision.datasets.MNIST('../data', train=False, 
                   transform=torchvision.transforms.Compose([
                       torchvision.transforms.ToTensor(),
                       torchvision.transforms.Normalize((0.1307,), (0.3081,))
                   ]))

Thanks.

ptrblck · June 20, 2022, 2:47am

Yes, the train=True/False argument determines if the dataset uses the training or testing dataset as seen here.
The complete data will be downloaded before the corresponding train/test binary will be loaded as seen here.

SouthTorch · June 20, 2022, 3:08am

Thanks, ptrblck!

You really amazed me when pointing out those code lines.

Do you guys really read the code of an API when learning it? or Are you one of the code contributors?

Thanks.

ptrblck · June 20, 2022, 3:15am

Ha, thanks. I’ve seen the code a lot of times already to be familiar with it.