Advantages & Disadvantages of using pickle module to save models vs torch.save

pinocchio · April 29, 2020, 6:34pm

I was reading Save and Load model but it wasn’t clear why I’d use torch.save over pickle.dump.

What worries me is that my Neural Net modules/object have many more things inside of them besides only parameters. Since pytorch saves things using the state_dict that worried me that something might be missing (https://pytorch.org/tutorials/beginner/saving_loading_models.html).

Isn’t for my use case easier to simply use the pickle.dump function? Why would I use torch.save in general?

Note I am using gpu’s. Not sure if that matters.

The easiest thing for me is if I could just save the whole python program state (similar to how matlab used to allow you to do it https://www.mathworks.com/help/matlab/ref/save.html).

ptrblck · April 30, 2020, 4:28am

The state_dict will store all registered parameters and buffers.
If you need to serialize some tensors, you should thus create an nn.Parameter, if it’s trainable, or a buffer via self.register_buffer(name, tensor), if it’s not trainable.

Answered in the other thread.

That’s generally not recommended.
Instead you should store the state_dicts and the source files separately.
Storing e.g. the complete model could force you to recreate exactly the same file and folder structure.

pinocchio · April 30, 2020, 2:45pm

Hi @ptrblck thanks for your responses! Greatly appreciated

I am curious, why is it not recommended to store the whole python program state? (or as close to that as possible).

I am currently using dill to simulate this as much as possible because my neural network module classes contain pointers to many things, including lambda functions I’d like to restore properly.

reference:

ptrblck · April 30, 2020, 3:24pm

As mentioned in my last post, you could be forced to recreate the exact find and folder structure, if you save the complete model via torch.save.
I don’t know how dill is working in this case.

tothedistance · August 13, 2021, 7:22am

yes, the same file structure needs to maintain.
moreover, the tensor function will still be the one used when saving.
i loaded a yolov5 model which is version 4.0 and uses silu as act. but i accidentally used local version of yolov5 version1.0 which uses leaky relu. however, the model still uses silu. i guess the function is stored in tensor.

ado_sar · August 18, 2023, 11:22am

Is the with statement necessary in case we use torch.save? That is, should we write something like:

with open('foo.pt', 'wb') as fhand: torch.save(obj_to_save, fhand)

?

ptrblck · August 18, 2023, 2:22pm

No, since torch.save will create and open the file for you if it doesn’t exist.

heradsinn · November 14, 2024, 3:29pm

@ptrblck I have a follow-up question to this. In our production, we have frequently encountered incompatibility issues due to saving/loading and changed source files. For example, let’s say we trained and deployed a model in January which would be loaded like so:

from models.ExampleNet import ExampleNet

model = ExampleNet()

loaded = torch.load("example_saved_file.pt")
loaded_state_dict = loaded["state_dict"]
model.load_state_dict(loaded_state_dict)

This seems fine. However, in February some changes to the model definition were made as part of the R&D process, perhaps additional norm layers were added or the model was made deeper. These changes were subsequently pushed to dev, staging, prod and deployed to the live system. The above loading procedure now fails since the model architecture doesn’t match that of the loaded_state_dict. Though saving the model directly is not generally recommended, would doing so here with something like torch.save(trained_model) (instead of its state dict) be a good approach or at least work around? I’m curious to know if there are additional factors that would impact this that still would make it a bad idea.

ptrblck · November 14, 2024, 10:37pm

I don’t believe so as source code changes could easily fail and the error messages might be quite confusing if the model definition changed.
Especially if your model uses and depends on a lot of imports from other scripts, which could be changed, the debugging could become quite challenging.