DataLoader Multi-threading Random Number

Everywhere I checked, I saw the note:

To use multi-threading with numpy random in the DataLoader, use the worker_init_fn with torch.initial_seed()

I’m trying to understand exactly what’s happening with this code snippet:

worker_init_fn=lambda _: np.random.seed(int(torch.initial_seed())%(2**32-1)))`

I know that np.random.seed() requires integer output. So converting the long from torch.initial_seed, and finding modulo 2^32 -1 will give a seed between (0 and 2^32-1)

Does this mean that each worker is initialized with this number as seed?
Or does it mean that each worker is initialized with this number + worker_id as seed?

And does the worker_id change between epochs? (I’m thinking it should, as it seems to be a new thread called by the main python thread…?)

2 Likes

it means this

no. but each worker is seeded using base_seed + worker_id, and base_seed is different every epoch.