Packaging pytorch topology first and checkpoints later

Erotemic · August 17, 2021, 4:04pm

I’d like to clarify the use case.

At the end of training I my trainer will save the best N candidate checkpoints based on training and validation metrics. I want to create a torch package for each of these N checkpoints to create N corresonding candidate packages that can then be evaluated in standalone prediction+evaluation code.

I could do this by looping over each checkpoint, and calling model.load_state_dict(torch.load(checkpoint_path)), but this seems heavy-handed and unnecessary. The call to torch.load is expensive and calling model.load_state_dict is not thread safe, and has the side effect of modifying the current state of the model (and I would like to avoid side-effects if possible).

Is there a way to structure a torch.package, such that I can make a copy of some “base” zipfile (where ideally the purpose of that zipfile is to only store the network topology, perhaps it has some dummy file representing where the weight checkpoint would go, but it would be nice if that was kept small to minimize the time it takes to make a copy of the file) and then simply modify the copy by adding the desired checkpoint containing the pickled model_state_dict?