Why the checkpoint size becomes smaller after i resave it

wadewang · January 14, 2022, 2:36pm

I trained a model, but i need to modify one of the model state dict, so i use following code to modify the value of specified step key:

def modify_model_state_dict():
    path = f"/media/wwd/2THardDisk/models/base.pt"
    checkpoint = torch.load(path)  # my checkpoint contains two parts:  'model_state' and "optimizer_state"
    new_state_dict = checkpoint['model_state'].copy()
    new_state_dict['step'] = torch.tensor([1], device='cuda:0')  # reset the step to 1
    new_path = path[:-3] + '_new.pt'
    torch.save({
                "model_state": new_state_dict,
                "optimizer_state": checkpoint["optimizer_state"],
            }, new_path)

modify_model_state_dict()

After i saved the modified model, i checked its size, found that its size is 526,147,901 Bytes while original model size is 526,153,469 Bytes, so i am curious why the model size becomes smaller, where does the 5568 Bytes go ？ Does this means some infomation is lost ？

ptrblck · January 16, 2022, 9:17pm

By default torch.save should compress the data, so I guess depending on the actually used algorithm changes in the data could result in different file sizes.

wadewang · January 21, 2022, 5:39am

I did not change the algorithm. Later, I used the modified model to inference, and it seemed to be normal. Recently, I continued training with this modified model, no error was reported, and I saw that the newly saved model size returned to the previous 526,153,469 Bytes. So it’s a mystery, although I don’t know what is the reason behind it, I guess it’s caused by the secondary compression, but I didn’t verify it, since I use it without problems, so I didn’t go into it.