Omroth
(Ian)
June 24, 2024, 9:31am
1
I’ve noticed that despite comprehensively seeding before starting training:
seed = 7
random.seed(seed)
np.random.seed(seed)
torch.manual_seed(seed)
My training data is not loaded in the same order.
But it occurs to me that this is “expected” with num_workers > 1, because of race conditions for machine resources between the worker threads. Is that correct? Does it cause anyone issues?
You need to pass a generator to DataLoader. From https://pytorch.org/docs/stable/notes/randomness.html:
def seed_worker(worker_id):
worker_seed = torch.initial_seed() % 2**32
numpy.random.seed(worker_seed)
random.seed(worker_seed)
g = torch.Generator()
g.manual_seed(0)
DataLoader(
train_dataset,
batch_size=batch_size,
num_workers=num_workers,
worker_init_fn=seed_worker,
generator=g,
)