I found that multi-thread pre-fetching training samples also introduces randomness. In the multi-thread way, in a new run the samples are put into the queue in a new order, determined by the relative speed of the threads. I had to set the number of pre-fetching threads to
1 to solve the problem.
What’s more, if the pre-fetching thread (only having one pre-fetching thread in this case) is not the main thread (i.e. it’s parallel to the main thread) and both threads are using random numbers, then make sure that these two threads use different random value generators, each generator having its own seed. Otherwise, their relative order of accessing the random value generator may differ in different runs. To have separate random value generators, for example, the main thread may set the seed like
numpy.random.seed(seed) and use
numpy.random.uniform() to generate a random value; The pre-fetching thread creates its own generator with seed
prng = numpy.random.RandomState(seed) and generates values like this
BTW, I implemented the multi-threading in my own way using package
threading, not using the official one.