Should pytorch model serialization be deterministic?

basically, subj. I want my code to operate deterministically, so I’m using the following initialization:

manual_seed = 1234

during training the loss function values (and some other metrics) are always the same from run to run, which makes me think the actual model is deterministic. At (deterministic) checkpoints the model is saved as follows:, model_filename)

However, the diff tool indicates a difference between model files from run to run, hence the question.


some of the model operations are non-deterministic, especially around multi-threaded OpenMP optimization and around CUDA operations.

So this might be normal behavior.

With GPUs it’s notoriously hard to get both determinism and performance at the same time, so we choose performance.

Did you get to understand what is the source of the diff differences ?
Same issue here :slight_smile: