Does iterating over unshuffled Dataloader change random state?

hubsiiiii · February 21, 2023, 11:18am

Hello,

I am a little bit confused about the reproducibility and random state of PyTorch. I set seeds in my code, and everything is deterministic and reproducible. If I start my experiments with the same input, I get the same output. This is how I set the seeds:

    if args.deterministic:
        cudnn.benchmark = False
        cudnn.deterministic = True
        random.seed(args.seed)
        np.random.seed(args.seed)
        torch.manual_seed(args.seed)
        torch.cuda.manual_seed(args.seed)

However, I just found out that when I load a pre-trained model and evaluate it on my test dataloader BEFORE I continue training, I get different accuracy results compared to not evaluating it before continuing training! The shuffled argument in my test dataloader is set to False, the number of workers is set to 0 and I don’t have any transformations in my test dataset that require randomness.
I even tried to simply iterating over my test dataloader without passing the data through my model before continuing training. I still get different results, but they are identical to what I would get if I evaluate my model on the test dataloader (before training).

Therefore, my question is, does simply iterating over a dataloader change the random state?

Thank you, I would really appreciate your help

EDIT: I implemented a small example:
If I execute the following code:

import torch
torch.manual_seed(42)
print(torch.rand(1))

I always get the following output:

tensor([0.8823])

However, if I execute the following code:

import torch
from torchvision import datasets
from torchvision.transforms import ToTensor
from torch.utils.data import DataLoader

torch.manual_seed(42)

test_data = datasets.FashionMNIST(root="data", train=False, download=True, transform=ToTensor())
test_dataloader = DataLoader(test_data, batch_size=64, shuffle=False)
for test_features, test_labels in test_dataloader:
    pass

print(torch.rand(1))

I get the following output:

tensor([0.3829])

I think the outputs should be the same since I am not performing any operations that require randomness.

ptrblck · February 22, 2023, 5:09am

That’s expected since the _BaseDataLoaderIter sets its _base_seed here via:

self._base_seed = torch.empty((), dtype=torch.int64).random_(generator=loader.generator).item()