Is an indivisible transaction?

If I call and terminate mid-save, will the transaction be fully cleared, and any pre-existing save will be preserved and not over-written?

I’m getting an error when loading a saved state_dict after a training session that terminated mid-save, so it seems like the transaction occurs partially and yields a corrupted checkpoint.

Is it because I’m also saving some int values and loading with map_location?

@Sam_Lerman internally uses python pickling to write the objects to the file. Pickling is not a single operation but there are multiple writes of various things like : Some pickling headers, various child objects present in the object you want to pickle. And if you are overwriting on the same location, you have opened those files in write mode so data will be lost, if it is interrupted mid way

If you want atomic operations during write, you need to use a temp location and transfer the file only on the success of the operation