How to get deterministic behavior?

dlmacedo · May 16, 2018, 4:34am

I am using:

cudnn.benchmark = False
cudnn.deterministic = True

random.seed(1)
numpy.random.seed(1)
torch.manual_seed(1)
torch.cuda.manual_seed(1)

And still not getting deterministic behavior…

ptrblck · May 16, 2018, 10:53am

How large are the differences?
Could you provide a code snippet showing the error?

If the absolute error is approx. 1e-6 it might be due to the usage of float values.

dlmacedo · May 16, 2018, 11:41am

github.com

dlmacedo/deep-learning-class/blob/master/not_deterministic.py

import argparse
import sys
import os
import time
import random
import numpy
import pandas as pd
import torch
import torch.backends.cudnn as cudnn
import torch.nn as nn
import torch.nn.parallel
import torch.optim
import torch.utils.data
import torchvision.transforms as transforms
import torchvision.models as torchvision_models
import torchnet as tnt

from tqdm import tqdm
from torch.autograd import Variable
from tensorboardX import SummaryWriter

This file has been truncated. show original

dlmacedo · May 16, 2018, 11:42am

The diferences are meanifull… Not of the order of 1e-6…

The diferences are about 1% more or less…

dlmacedo · May 16, 2018, 3:17pm

The problem is in the data augmentation transformations:

transforms.RandomCrop(32, padding=4),
transforms.RandomHorizontalFlip(),

After I removed the above code, it worked 100% deterministically…

I am using Pytorch 0.3.1

Does torchvision have a different seed from torch?

dlmacedo · May 16, 2018, 3:27pm

The following appears to be the same issue:

If I use num_workers=0, I can get back the augmentation without losing the deterministic behavior, exactly as reported in the link above.

SimonW · May 16, 2018, 3:57pm

This has been fixed in 0.4, provided you set random.seed in worker_init_fn.

Furthermore, you might want to set torch.backends.cudnn.deterministic=True

dlmacedo · May 16, 2018, 7:55pm

It worked. Thanks. Nevertheless, I think we should have a more straightforward way to get determinist behave. This “workaround” in the workers is not so intuitive for beginners.

dlmacedo · May 29, 2018, 4:40am

Based on my tests, even in PyTorch 0.4 we still need to initialize the workers with the same seed to get deterministic behavior. The following lines are NOT enough:

cudnn.benchmark = False
cudnn.deterministic = True

random.seed(1)
numpy.random.seed(1)
torch.manual_seed(1)
torch.cuda.manual_seed(1)

I think this should not be the standard behavior. In my opinion, the above lines should be enough to provide deterministic behavior. It is not obvious to the novice that, besides the above lines, he also need to initialize the workers with the same seed to get deterministic behavior.

SimonW · June 20, 2018, 4:34pm

The problem is numpy. We can’t assume that numpy exists so you’d have to set the seed for numpy in workers yourself.

ndronen · June 21, 2018, 9:56pm

IMO, worker_init_fn allows some flexibility, but why shouldn’t PyTorch’s workers have a reasonable default behavior? Something like the following code block, if it were to execute before worker_init_fn, would be backward compatible and would provide determinism out of the box, whether Numpy is installed or not.

try:
    import numpy
    torch_seed = torch.initial_seed()
    # Numpy expects unsigned integer seeds.
    np_seed = torch_seed // 2**32-1
    numpy.random.seed(np_seed)
except Exception:
    pass

surojit_sengupta · January 1, 2019, 3:55pm

So does torch.backends.cudnn.deterministic=True
make any form of random data pre-processing using torch libraries deterministic?

iainmelvin · April 3, 2019, 9:31pm

Beware running 2 processes on the same machine, even if you set random seeds and set num_workers=0. In my experience running one process on it’s own is deterministic, but running 2 processes side by side is not. The consensus here is that static variables in dynamically linked libraries are to blame. If you need to compare, run on two different machines.
(by 2 processes I mean two training scripts for example)

taromakino · June 25, 2019, 4:22pm

def set_seed(seed):
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
torch.manual_seed(seed)
torch.cuda.manual_seed_all(seed)
np.random.seed(seed)
random.seed(seed)

The final line solved this issue for me.

dstanner · October 12, 2019, 5:41am

I’ve read this thread closely and tried everything I can see on here as suggestions, and I still cannot get deterministic behavior during training. I’m using Pytorch 1.1.0 and Torchvision 0.3.0.

In my code I have the following:

torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
torch.manual_seed(1)
torch.cuda.manual_seed_all(1)
np.random.seed(1)
random.seed(1)

And my dataloader looks like this:

train_loader = torch.utils.data.DataLoader(
    train_dataset,
    batch_size=128,
    num_workers=0,
    shuffle=True,
    pin_memory=True,
    worker_init_fn=random.seed(1)
)

Is there a place I’ve missed setting the seed? Looking at the first few mini-batches of training, the differences in accuracy across starts is pretty stark.

Also, I don’t even get deterministic behavior when I disable RandomResizedCrop and RandomHorizontalFlip in my image transforms – so the non-deterministic behavior is happening somewhere else.

Any clues or pointers would be appreciated.

xiaoyanzhuo · May 2, 2020, 3:58pm

Based on previous answers, and official docs, what I did works for me.

def set_seed(seed):
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False
    torch.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)
    np.random.seed(seed)
    random.seed(seed)

set_seed(0) # 0 or any number you want for the seed

in the DataLoader(), I also set the num_workers = 0

This setting makes my results reproducible.
I am using latest Pytorch version is ‘1.5.0’.

arnabsinha · June 9, 2020, 6:39am

Will this work for Pytorch version 1.0.1 as well?

xiaoyanzhuo · June 9, 2020, 4:25pm

Not sure about 1.0.1. I suppose if the same options are available, it may work as well. You can try.

arnabsinha · June 9, 2020, 5:35pm

Well I added torch.backend.cudnn.enabled = False and it worked for me. I may have to recheck the results with it to torch.backend.cudnn.enabled = True.

Tethys_Sun · August 24, 2020, 10:27pm

Hi @dstanner, did you solve it? This doesn’t works for me too.