[Torchvision]Saving pytorch model as file issue

Hi everyone,

I got a bit of confuse in saving the model, as suggested, I did something like this:

torch.save(model.state_dict(), PATH_1)

Rerun the code, save as different file

torch.save(model.state_dict(), PATH_2)

When I use cmp command to pair between the 2 binary files, there are differences between them. It is because storing float numbers is slightly different in 2 files, even though they are still referencing to the same values.

My requirement for the project is to have the identical binary files everytime we store the same model. So this is quite troublesome.
Do you have any solution or work around to deal with this?

Thanks a lot for your support,

Cheers,

If you are rerunning the code, are you following the reproducibility docs?

If you store the model.state_dict() in two different files, both will contain the same binary representation.

However, if you don’t use deterministic operations, your model will of course differ.

Hi,

Thank you for your response. When mentioning rerunning the code, I meant I already have the *.pt file, so I just try to load the model and save again with different file.
I noticed that if I load the model and save in different files at the same code, they will be identical, but if I did it in 2 different runs the files will be different.

I don’t know much about deterministic operations, would you refer me some links/readings so I can take a bit research on it?

Thanks again and have a nice day,
Regards,

You are right and I’m not sure, where these different bits come from.

The linked docs gives you a good overview about reproducibility etc.
However, even setting the same seed in both scripts will yield different binary representations and I think it might be related to the pickling itself.

Hi,

Thanks for your response. Yes I was also thinking about the pickling but I am struggling on how to precise (to some amount of precision after floating point) the storing on the binary file. If I understand properly, it is because of storing float in the file. For instance, 5.002 might be stored 5.002000000000001 in file 1 and 5.002000000000002 in file 2.

Would be nice if you know any function to indicate torch.save with precision level, or maybe it’s not there at all.

Regards,