Can't achive reproducability / determinism in pytorch training

I am implementing a active machine learning object detection pipeline with pytorch inside a jupyter notebook. I am using fasterRCNN, COCO annotations, SGD optimizer and GPU training.

To ensure determinism i try to run one epoch of training two times and receive different losses by the end of both. The loss after the first step is always the same, so initialization is not the problem.

What i already tried:

  • i made sure the order of images fed are in the same order
  • jupyter kernel restartet between training runs
  • batch_size = 1, num_workers = 1, disabled augmentation
  • CPU training is deterministic(!)
  • the following seeds are set:
    – seed_number = 2
    – torch.backends.cudnn.deterministic = True
    – torch.backends.cudnn.benchmark = False
    – random.seed(seed_number)
    – torch.manual_seed(seed_number)
    – torch.cuda.manual_seed(seed_number)
    – np.random.seed(seed_number)
    – os.environ[‘PYTHONHASHSEED’]=str(seed_number)

Here is a link to the primary code for the training:
my code inside a colab
(its not functional, since i just copied it out of my local jupyter but shows what i am trying to do)

The training log for two identical runs look like this:


Please let me know if there is any additional information needed :slight_smile:

Hi,

Did you had a look at the reproducibility notes ?

Yes i did all the measurements suggested in the notes.
Although i have heard that “the seeds do not behave globally”.
Do have any information on this or other ideas i could try to achieve determinism?

In my experience, it is very very hard.
Also won’t be reproducible as soon as you update any library/hardware/your code. So usually not very useful.

Although i have heard that “the seeds do not behave globally”.

Not sure what that means. If you use a single process it will work as expected.

Hmm okay.
So in your experience it is impossible to achieve true determinism on pytorch gpu training?

Across hardware and library versions no.
Unfortunately, floating point operations are not associative. So if any library changes the order of a single OP, then the whole thing breaks.
Or if you GPU has a different number of processing units and so split the work differently.
etc

That being said, for a fixed hardware and version, we try to be deterministic.
I’m just saying this so that you don’t spend several days to get things reproducible on your machine but then realize it doesn’t work when you switch machine :slight_smile:

Knowing that, if you still want to get reproducibility for that hardware and version, I would track down the operations that are listed as non-deterministic in the reproducibility note (maxpool I’m looking at you).

That sounds better already :slight_smile:
I am developing on my server a pipeline for my master thesis so the setup will stay the very same until i finish my experiments for the thesis.
Therefore i am ready to invest into determinism a bit.