Should pytorch model serialization be deterministic?

vladimir · May 2, 2017, 5:52am

basically, subj. I want my code to operate deterministically, so I’m using the following initialization:

manual_seed = 1234
random.seed(manual_seed)
np.random.seed(manual_seed)
torch.manual_seed(manual_seed)
torch.cuda.manual_seed(manual_seed)

during training the loss function values (and some other metrics) are always the same from run to run, which makes me think the actual model is deterministic. At (deterministic) checkpoints the model is saved as follows:

torch.save(net.state_dict(), model_filename)

However, the diff tool indicates a difference between model files from run to run, hence the question.

Thanks,

smth · May 3, 2017, 3:20am

some of the model operations are non-deterministic, especially around multi-threaded OpenMP optimization and around CUDA operations.

So this might be normal behavior.

With GPUs it’s notoriously hard to get both determinism and performance at the same time, so we choose performance.

cyberjoac · December 27, 2018, 5:25pm

Did you get to understand what is the source of the diff differences ?
Same issue here