Effect of torch.backends.cudnn.deterministic=True

rezzy · February 26, 2021, 1:14pm

As far as I understand, if you use torch.backends.cudnn.deterministic=True and with it torch.backends.cudnn.benchmark = False in your code (along with settings seed), it should cause your code to run deterministically.

However, for reasons I don’t understand, if I remove the two lines it will always result in worse results. Even setting deterministric for CUDNN and other places, I still don’t get fully identical results, but removing it causes my loss to not go any lower (top lines in attached image). What am I doing wrong?

Environment:

Pytorch Lighting 1.7
Using DDP accelerator
Settings seeds for random, np, torch (manual_seed_all)
Have the below code for worker_init_fn of DataLoader:

def loader_init_fn(seed):
    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)

albanD · February 26, 2021, 3:21pm

Hi,

You can check our doc on reproducibility for the full details: Reproducibility — PyTorch 1.7.1 documentation

rezzy · February 27, 2021, 8:03pm

Hi,
Thanks for response. I understand the reproducibility requirements, but my question is, will disabling CuDNN benchmarking have an impact on the result - i.e., the operations themselves, not how fast they are performed?

Thanks!

ptrblck · March 2, 2021, 4:56am

The different cudnn setups should yield approx. the same final result. If that’s not the case, you might hit a faulty kernel.
To isolate it, we would need to get the model definition, input shapes, used cudnn version, as well as the used GPU. With these information we can create the cudnn logs and check internally with reference implementations for correctness.