Can't achive reproducability / determinism in pytorch training

I am implementing a active machine learning object detection pipeline with pytorch inside a jupyter notebook. I am using fasterRCNN, COCO annotations, SGD optimizer and GPU training.

To ensure determinism i try to run one epoch of training two times and receive different losses by the end of both. The loss after the first step is always the same, so initialization is not the problem.

What i already tried:

  • i made sure the order of images fed are in the same order
  • jupyter kernel restartet between training runs
  • batch_size = 1, num_workers = 1, disabled augmentation
  • CPU training is deterministic(!)
  • the following seeds are set:
    – seed_number = 2
    – torch.backends.cudnn.deterministic = True
    – torch.backends.cudnn.benchmark = False
    – random.seed(seed_number)
    – torch.manual_seed(seed_number)
    – torch.cuda.manual_seed(seed_number)
    – np.random.seed(seed_number)
    – os.environ[‘PYTHONHASHSEED’]=str(seed_number)

Here is a link to the primary code for the training:
my code inside a colab
(its not functional, since i just copied it out of my local jupyter but shows what i am trying to do)

The training log for two identical runs look like this:


Please let me know if there is any additional information needed :slight_smile:

Hi,

Did you had a look at the reproducibility notes ?

Yes i did all the measurements suggested in the notes.
Although i have heard that “the seeds do not behave globally”.
Do have any information on this or other ideas i could try to achieve determinism?

In my experience, it is very very hard.
Also won’t be reproducible as soon as you update any library/hardware/your code. So usually not very useful.

Although i have heard that “the seeds do not behave globally”.

Not sure what that means. If you use a single process it will work as expected.

Hmm okay.
So in your experience it is impossible to achieve true determinism on pytorch gpu training?

Across hardware and library versions no.
Unfortunately, floating point operations are not associative. So if any library changes the order of a single OP, then the whole thing breaks.
Or if you GPU has a different number of processing units and so split the work differently.
etc

That being said, for a fixed hardware and version, we try to be deterministic.
I’m just saying this so that you don’t spend several days to get things reproducible on your machine but then realize it doesn’t work when you switch machine :slight_smile:

Knowing that, if you still want to get reproducibility for that hardware and version, I would track down the operations that are listed as non-deterministic in the reproducibility note (maxpool I’m looking at you).

That sounds better already :slight_smile:
I am developing on my server a pipeline for my master thesis so the setup will stay the very same until i finish my experiments for the thesis.
Therefore i am ready to invest into determinism a bit.

Hi,

I am having the same issue. I cannot get reproducible results training the FasterRCNN model in PyTorch. I have followed everything in REPRODUCIBILITY doc.

I set the seed at the beginning of my code as follows:

g = torch.Generator()
g.manual_seed(10)
def seed_worker(worker_id):
    worker_seed = torch.initial_seed() % 2**32
    np.random.seed(worker_seed)
    random.seed(worker_seed)
def set_seed(seed):
    os.environ['PYTHONHASHSEED'] = str(seed)
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False
    torch.backends.cudnn.enabled = False
    #torch.use_deterministic_algorithms(True)
    torch.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)
    np.random.seed(seed)
    random.seed(seed)
set_seed(10)

And I have disabled the augmentation and set the number of workers as 0 in data loader.

data_loader = torch.utils.data.DataLoader(
            dataset, batch_size=2, shuffle=True, num_workers=0,
            collate_fn=utils.collate_fn, worker_init_fn=seed_worker, generator=g)

Model is created as follows:

self.model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True)
num_classes = 2  # 1 class (nodule) + background
in_features = self.model.roi_heads.box_predictor.cls_score.in_features
self.model.roi_heads.box_predictor = FastRCNNPredictor(in_features, num_classes)
self.model.to(self.device)

I use cuda 10.2 with torch version 1.10.0, so I have also set the environmental variable CUBLAS_WORKSPACE_CONFIG as mentioned in the tutorial. Do you have an idea why the results are not reproducible - and any suggestions of what could I try?

Also, if i set torch.use_deterministic_algorithms(True) after setting the CUBLAS_WORKSPACE_CONFIG, i get the following error:

RuntimeError: linearIndex.numel()sliceSizenElemBefore == value.numel()INTERNAL ASSERT FAILED at “/pytorch/aten/src/ATen/native/cuda/Indexing.cu”:250, please report a bug to PyTorch. number of flattened indices did not match number of elements in the value tensor10231

Many thanks, Ecem

I think the issue I mentioned was fixed recently in pytorch, I have installed the latest PyTorch version from the nightly build: ‘1.11.0.dev20211130+cu102’, and I do not get the following error anymore:

RuntimeError: linearIndex.numel()sliceSize nElemBefore == value.numel()INTERNAL ASSERT FAILED at “/pytorch/aten/src/ATen/native/cuda/Indexing.cu”:250, please report a bug to PyTorch. number of flattened indices did not match number of elements in the value tensor10231

Now I can set torch.use_deterministic_algorithms(True) in my code, but the results are still not reproducible.

Thanks!