Why cudnn make loss change？

XiaoYee · January 29, 2018, 11:31am

when i use cudnn in pytorch，namely set torch.backends.cudnn.enabled = True，then my train loss will be different every time when i begin to train.

albanD · January 29, 2018, 11:46am

Hi,

This is because cudnn uses some non-deterministic algorithms (for speed reasons).
You can force it to use deterministic ones by setting torch.backends.cudnn.deterministic = True, but you might see a performance hit.

XiaoYee · January 30, 2018, 2:25am

Thank you for your reply.However i tried setting torch.backends.cudnn.deterministic = True.But i still get different train loss.But if i set torch.backends.cudnn.enabled = False, then i can get same train loss.
Are there anything else that i need to change？

SimonW · January 30, 2018, 2:39am

How much of a difference are you seeing? Even with deterministic flag set to False, the difference shouldn’t be large.

XiaoYee · January 30, 2018, 3:06am

yeah，they don‘t have much difference.I tried four times,here are epoch 1 results:

loss: 789.759372(cudnn enable)
loss: 789.788586(cudnn enable,twice time)
loss: 790.352628(cudnn enable,deterministic = True)
loss:790.901784 (cudnn enable,deterministic = True,twice time)
But i only have a small test dataset for video classification, if i use cudnn,after same epochs(such as 150 epochs),the result will be a little different.So i want to know that if there are methods that make both result fixed and cudnn enable.

albanD · January 30, 2018, 10:04am

Hi,
The cudnn deterministic flag should allow you to do what you want, there might be another source of this in your code.
Make sure that you set the random seed for all the random part of your code (both if you use python’s random, pytorch cpu random and/or pytorch gpu random).
Could you provide a small code snippet that we could run to reproduce the issue please?