Despite this, I still get widely different results (5% accuracy difference). While reading the PyTorch guidelines, I read this part which I am not sure if I understand fully.
“However, some applications and libraries may use NumPy Random Generator objects, not the global RNG (Random Generator — NumPy v1.26 Manual), and those will need to be seeded consistently as well.”
Maybe I need to set something different? anyone knows how to take care of NumPy Random Generator? I appreciate your guidance.
P.S. I’m working with a Bayesian network (Variational BNN) in a continual learning setting.
Did you also set torch.use_deterministic_algorithms(True) as mentioned in the Reproducibility docs?
Seeding might not be enough to get deterministic and reproducible outputs if the algorithms themselves produce the non-deterministic results.
and one more question regarding torch.use_deterministic_algorithms(True), where should I set it up? so, I have many modules in my code where I am importing torch. Should I add torch.use_deterministic_algorithms(True) after import torch?
Thank you for your reply. I imported torch.use_deterministic_algorithms(True) after import torch in all the modules that imported torch in my code. I get the following error:
RuntimeError: Deterministic behavior was enabled with either torch.use_deterministic_algorithms(True) or at::Context::setDeterministicAlgorithms(true), but this operation is not deterministic because it uses CuBLAS and you have CUDA >= 10.2. To enable deterministic behavior in this case, you must set an environment variable before running your PyTorch application: CUBLAS_WORKSPACE_CONFIG=:4096:8 or CUBLAS_WORKSPACE_CONFIG=:16:8. For more information, go to cuBLAS
The line of code where the error comes from is: F.linear(input_means, self.weight, self.bias)
I tried setting up the first environment variable before python main.py but hasn’t worked so far.
Any ideas on this? does it mean that there is no deterministic equivalent to F.linear()
I need to add that my CUDA driver version is 510.73.05 and CUDA_Version is 11.6.
If you are still seeing the error after trying to set the env variable, it might have been too late in the script.
Set it as an external env variable or during the launch:
Thank you very much. I added the argument before python script.py. It worked and then I was getting an error regarding the non-deterministic functions so I added torch.use_deterministic_algorithms(True, warn_only=True).
Now, I get this warning instead:
UserWarning: nll_loss2d_forward_out_cuda_template does not have a deterministic implementation, but you set ‘torch.use_deterministic_algorithms(True, warn_only=True)’. You can file an issue at Issues · pytorch/pytorch · GitHub to help us prioritize adding deterministic support for this operation. (Triggered internally at /opt/conda/conda-bld/pytorch_1646755897462/work/aten/src/ATen/Context.cpp:79.)
and my result is not identical. I am running a continual learning application (prior based: using variational inference) and in two runs I get accuracies for task 1 and task 2 as follows:
Task 1: Run1: 94.8%, Run2: 96.40%
Task2: Run1:83.80%, Run2: 81.00%
and this gap continues for the upcoming tasks.
I wonder whether this is normal in light of having one non-deterministic function, here nll_loss2d_forward_out_cuda?
It’s hard to tell as it would depend on the overall stability of your training.
I.e. assuming you would get deterministic results you could rerun the script with different seeds and compare how the model would perform in the end.
Since you are already using non-deterministic methods, you wouldn’t have a baseline but the test might still be interesting.
Also, in case you are not on the latest PyTorch version, install 1.12.0 or the nightly release to check if this operation has a deterministic version now.