I’m trying to do my code reproducible using same parameters and, in fact, it works fine when I do not use torch.multiprocessing.
So, in both codes I have the following seed and cudnn sets:
# Set seed for deterministic results
torch.manual_seed(12345)
torch.cuda.manual_seed(12345)
np.random.seed(12345)
random.seed(12345)
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
Also, I’m passing the following function as worker_init_fn argument of torch.utils.data.DataLoader:
# function to set dataloader worker seed
def _init_fn(worker_id):
np.random.seed(12345 + worker_id)
Dataloader num_workers argument is set to 1.
While I have deterministic and reproducible results running the code on a single process, training the network using torch.multiprocessing do not give me the same deterministic reproducibility.
The only difference between both codes is that when I use torch.multiprocessing, I set all the seeds and create the dataloader on father process and create the model and train on child process.
Question is, am I missing something to make my results reproducible using torch.multiprocessing? Any insight is really appreciated.
By the way, I have just one GPU and I am spawning just one process, so in the torch.multiprocessing code father is just a manager and training happens on child. I have multiple models so every new model training happens on a new child process after the previous child finished.