It worked. Thanks. Nevertheless, I think we should have a more straightforward way to get determinist behave. This “workaround” in the workers is not so intuitive for beginners.
Based on my tests, even in PyTorch 0.4 we still need to initialize the workers with the same seed to get deterministic behavior. The following lines are NOT enough:
I think this should not be the standard behavior. In my opinion, the above lines should be enough to provide deterministic behavior. It is not obvious to the novice that, besides the above lines, he also need to initialize the workers with the same seed to get deterministic behavior.
IMO, worker_init_fn allows some flexibility, but why shouldn’t PyTorch’s workers have a reasonable default behavior? Something like the following code block, if it were to execute before worker_init_fn, would be backward compatible and would provide determinism out of the box, whether Numpy is installed or not.
Beware running 2 processes on the same machine, even if you set random seeds and set num_workers=0. In my experience running one process on it’s own is deterministic, but running 2 processes side by side is not. The consensus here is that static variables in dynamically linked libraries are to blame. If you need to compare, run on two different machines.
(by 2 processes I mean two training scripts for example)
I’ve read this thread closely and tried everything I can see on here as suggestions, and I still cannot get deterministic behavior during training. I’m using Pytorch 1.1.0 and Torchvision 0.3.0.
Is there a place I’ve missed setting the seed? Looking at the first few mini-batches of training, the differences in accuracy across starts is pretty stark.
Also, I don’t even get deterministic behavior when I disable RandomResizedCrop and RandomHorizontalFlip in my image transforms – so the non-deterministic behavior is happening somewhere else.
Well I added torch.backend.cudnn.enabled = False and it worked for me. I may have to recheck the results with it to torch.backend.cudnn.enabled = True.
I never got this to work, no. I’ve been doing this on Windows, though. Not sure if that makes a difference. Some PyTorch functionality isn’t optimized for Windows I’ve found.
Hi, I have used exactly the same code but still my code non-deterministic. I am not able to reproduce the results. Tried in both pytorch 1.4.0 and 1.3.1