Saving split dataset

Kamil_Adamczewski · September 22, 2019, 4:21am

We can divide a dataset by means of torch.utils.data.random_split. However, for reproduction of the results, is it possible to save the split datasets to load them later?

ptrblck · September 22, 2019, 1:08pm

You could use a seed for the random number generator (torch.manual_seed) and make sure the split is the same every time.
Alternatively, you could split the sample indices, store each index tensor locally via torch.save, and use it in Subset.

Kamil_Adamczewski · September 23, 2019, 1:56pm

Thank you ptrblck for great answer, as always. It does work but I stumbled upon a strange issue.

I split my training set into training and validation set using a deterministic seed as mentioned:

torch.manual_seed(0)
train_dataset, val_dataset = torch.utils.data.random_split(trainval_dataset, [train_size, val_size])

I wanted to test the CNN then on a validation set (using torchvision CIFAR10). When I test it on a testset, the accuracy is always the same as expected. However, when I test it on validation set, the accuracy changes. When delving into the code, I realized the problem is not with the validation set (whose targets are the same every time I run an instance of a script) but actually the network spits different outputs.

How is it possible? Especially, that it appears when feeding the validation set but not a test set?

ptrblck · September 23, 2019, 2:11pm

Are you calling model.eval() when checking the accuracy on the validation set?

Kamil_Adamczewski · September 23, 2019, 2:23pm

Yes, I am. I also just tried model.train() for the sake of completeness and the same erratic behavior happens.

ptrblck · September 23, 2019, 2:26pm

To understand the issue completely: you are calling model.eval() and check the accuracy on the validation set. The validation set indices (passed to Subset) are definitely the same, but the accuracy changes for sequential runs?

Kamil_Adamczewski · September 23, 2019, 2:36pm

Precisely. Here are examples of two different runs. I show the output of the same batch and print respectively:

   print(predicted.eq(targets).sum().item())
   print(predicted.eq(targets))
   print(predicted)
   print(targets)

Targets are the same, but predicted values slightly vary.

Run 1:

118
tensor([1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1], device=‘cuda:0’, dtype=torch.uint8)
tensor([4, 8, 1, 0, 2, 4, 3, 4, 5, 6, 4, 4, 5, 7, 3, 2, 4, 5, 1, 7, 9, 9, 7, 9,
2, 4, 1, 1, 4, 8, 2, 9, 7, 6, 9, 1, 2, 9, 1, 1, 5, 1, 7, 7, 9, 4, 3, 3,
4, 6, 0, 5, 5, 5, 7, 7, 0, 7, 0, 4, 7, 3, 6, 1, 4, 0, 4, 0, 3, 1, 4, 8,
7, 6, 3, 7, 0, 5, 2, 0, 8, 5, 0, 2, 9, 7, 2, 2, 2, 8, 9, 6, 1, 1, 9, 1,
4, 9, 4, 8, 7, 6, 4, 7, 7, 8, 0, 6, 7, 4, 7, 5, 8, 3, 1, 3, 9, 8, 5, 8,
4, 2, 3, 7, 7, 2, 5, 1], device=‘cuda:0’)
tensor([4, 8, 1, 2, 2, 4, 3, 4, 5, 6, 4, 4, 5, 7, 5, 2, 4, 5, 1, 7, 1, 9, 7, 9,
2, 4, 1, 1, 4, 8, 2, 9, 7, 6, 9, 1, 2, 9, 1, 1, 5, 1, 7, 7, 9, 4, 3, 3,
4, 6, 0, 5, 5, 5, 7, 7, 0, 7, 0, 4, 7, 3, 6, 1, 5, 0, 4, 0, 3, 1, 4, 8,
7, 6, 3, 7, 2, 5, 3, 0, 8, 5, 0, 2, 9, 7, 2, 3, 2, 8, 9, 6, 1, 1, 9, 1,
4, 9, 4, 8, 7, 5, 4, 7, 7, 8, 0, 6, 7, 4, 7, 5, 8, 6, 1, 3, 1, 8, 5, 8,
4, 2, 3, 7, 7, 2, 5, 1], device=‘cuda:0’)

Run 2

121
tensor([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1], device=‘cuda:0’, dtype=torch.uint8)
tensor([4, 8, 1, 2, 2, 4, 3, 4, 5, 6, 4, 4, 5, 7, 3, 2, 4, 5, 1, 7, 1, 9, 7, 9,
2, 4, 1, 1, 4, 8, 2, 9, 7, 6, 9, 1, 2, 9, 1, 1, 5, 1, 7, 7, 9, 4, 3, 3,
4, 6, 0, 5, 5, 3, 7, 7, 0, 7, 0, 4, 7, 3, 6, 1, 5, 0, 4, 0, 3, 1, 4, 8,
7, 6, 3, 7, 2, 5, 3, 0, 0, 5, 0, 2, 9, 7, 2, 2, 2, 8, 9, 6, 1, 1, 9, 1,
4, 9, 4, 8, 4, 5, 4, 7, 7, 8, 0, 6, 7, 4, 7, 5, 8, 3, 1, 4, 1, 8, 5, 8,
4, 2, 3, 7, 7, 2, 5, 1], device=‘cuda:0’)
tensor([4, 8, 1, 2, 2, 4, 3, 4, 5, 6, 4, 4, 5, 7, 5, 2, 4, 5, 1, 7, 1, 9, 7, 9,
2, 4, 1, 1, 4, 8, 2, 9, 7, 6, 9, 1, 2, 9, 1, 1, 5, 1, 7, 7, 9, 4, 3, 3,
4, 6, 0, 5, 5, 5, 7, 7, 0, 7, 0, 4, 7, 3, 6, 1, 5, 0, 4, 0, 3, 1, 4, 8,
7, 6, 3, 7, 2, 5, 3, 0, 8, 5, 0, 2, 9, 7, 2, 3, 2, 8, 9, 6, 1, 1, 9, 1,
4, 9, 4, 8, 7, 5, 4, 7, 7, 8, 0, 6, 7, 4, 7, 5, 8, 6, 1, 3, 1, 8, 5, 8,
4, 2, 3, 7, 7, 2, 5, 1], device=‘cuda:0’)

ptrblck · September 23, 2019, 5:15pm

Were you able to exactly reproduce the same model parameters for your runs?
I.e. did you compare the state_dicts before running the validation loop?
Even if you are seeding and getting the same data samples for your runs, the result might still differ, e.g. due to cudnn as described in the Reproducibility docs.

Kamil_Adamczewski · September 24, 2019, 4:35am

I load the same pre-trained model in both cases. In the case of running this model on the test set, the output is always the same. Do you think there still can be any room for such nondeterministic behavior?

ptrblck · September 24, 2019, 10:12am

Did you follow the advice from the reproducibility docs?
Are you using and random transformations in your Dataset for the validation set?