Is new serialization format documented?

jsenellart-systran · January 19, 2017, 8:49pm

Hello! in https://github.com/OpenNMT/CTranslate - we are parsing in C the luatorch serialized format for complete inference in pure C code. Are specifications of pytorch serialized format somewhere so that we can do the same with new format?

apaszke · January 20, 2017, 6:37pm

I can write up the specs, but they’re much more complicated now. Serialized files are actually tars containing 4 other files - one listing tensors, one listing storages, one with storage data and one with system info (long size, endianness, etc.). I think for this use case it would by much simpler to take advantage of some more standardized format, with a C library supporting loading from it (such as HDF5 or protobuf). Do you think that would be ok?

edgarriba · January 21, 2017, 11:53am

This is somehow related
https://discuss.pytorch.org/t/import-export-models

jsenellart-systran · January 23, 2017, 7:54am

Thanks for your answer. yes I think this would be great - but cannot you use torch native serialization since you have them at hand in the TH* libraries - the 4 containers you are talking about could be simply put in a lua table - and this would avoid dependencies with other libraries? I understand that the pytorch objects (variables) would not be compatible with lua modules objects but at least would be readable.

apaszke · January 23, 2017, 10:02am

No we can’t. These are not continers, these four are binary blobs.

In Lua torch serialization has been written from scratch, in PyTorch we depend on pickle to save the objects, and while it allows us to have very robust serialization that conforms to python standards, it gives us less control, and is unreadable by Lua. I don’t think it’s worth changing that. It’s a complex matter, and conforming even to some of the old Lua standards would hurt usability for all Python users.

HDF5 is quite widespread and will provide you with a way to save weights in Python and load them in Lua.

jsenellart-systran · January 23, 2017, 10:06pm

ok I understand - I thought it was a specific serialization. I untared the .pt file and the unpickling recipe seems to be explicit in serialization.py. Since our end goal is reading from C - I think we can deal with untar/pickle - and from C dump back a lua format if we want to go back to lua/nngraph. In addition, we do need the compute graph structure - so you would need to make a new format just for that hdf5 or other serialization. I will give a shot at the untaring/unpickling from C.

apaszke · January 23, 2017, 11:00pm

Keep in mind that PyTorch checkpoints don’t contain the model structure - pickle doesn’t save the code, and it’s the runtime that defines the graph. You can’t go from a pickled model to nngraph or any other representation using only the .pt file. You’d need to dump the graph structure for a single iteration first (starting from output.creator).

apaszke · January 23, 2017, 11:00pm

I think in the future we’ll also have to figure out some kind of format that allows for exporting graphs. This will be needed for Caffe2 and TF model exports.

jsenellart-systran · January 23, 2017, 11:21pm

Let me rephrase that to be sure to correctly understand :).

In luatorch – the nngraph objects are saved in the serialization but we do need the code to build the class – each module implementing its load function. However, we do have enough information from the .t7 and the corresponding logic to build something functional.

You mean that in python the .pt does not contain the graph structure at all? When we save a model, we do need to specify the graph structure in order to load it?
In the .pt file – I see the files:
sys_info => is just a dictionary with system info
tensors => is the description of the tensor
storage => is the actual storage
pickle => I thought that it was here that I would find the graph structure (or the Variable expression) – isn’t it?

Thanks!
Jean

apaszke · January 24, 2017, 4:50pm

Yes, because in Lua torch nngraph objects are the same all the time. PyTorch builds the graph dynamically, so every iteration uses fresh Python objects to represent the graph, and the recipe for the graph construction is the model’s code.

Your descriptions are correct, but pickle won’t serialize your model class - it only saves a string like my_models.MyModel, and its __dict__ entries. THe name is a pointer that allows it to find your class at runtime, it will instantiate it, and copy the __dict__ items. As you see there’s no graph serialized there - only the parameters and class name.