when i use cudnn in pytorch，namely set torch.backends.cudnn.enabled = True，then my train loss will be different every time when i begin to train.
This is because cudnn uses some non-deterministic algorithms (for speed reasons).
You can force it to use deterministic ones by setting
torch.backends.cudnn.deterministic = True, but you might see a performance hit.
Thank you for your reply.However i tried setting torch.backends.cudnn.deterministic = True.But i still get different train loss.But if i set torch.backends.cudnn.enabled = False, then i can get same train loss.
Are there anything else that i need to change？
How much of a difference are you seeing? Even with deterministic flag set to False, the difference shouldn’t be large.
yeah，they don‘t have much difference.I tried four times,here are epoch 1 results:
- loss: 789.759372(cudnn enable)
- loss: 789.788586(cudnn enable,twice time)
- loss: 790.352628(cudnn enable,deterministic = True)
- loss:790.901784 (cudnn enable,deterministic = True,twice time)
But i only have a small test dataset for video classification, if i use cudnn,after same epochs(such as 150 epochs),the result will be a little different.So i want to know that if there are methods that make both result fixed and cudnn enable.
The cudnn deterministic flag should allow you to do what you want, there might be another source of this in your code.
Make sure that you set the random seed for all the random part of your code (both if you use python’s random, pytorch cpu random and/or pytorch gpu random).
Could you provide a small code snippet that we could run to reproduce the issue please?