To my knowledge, reusing networks in Pytorch typically requires a network class definition and a weights file (i.e.,
.pth), which is saved and loaded using the
In quantization, the problem is that the quantization process (e.g., post-training quantization) modifies the network class instance. This means that in order to reproduce the quantized model, either a programmer needs to define a new class that will be compatible with the modified instance, or the quantization process must be repeated on every new instance of the original FP32 model.
On Linux machines, it might be a reasonable workaround to post-training-quantize every new instance of the network. However, this scenario is not possible on Windows machines, since performing quantization is not currently supported on them.
What is therefore a recommended practice for really saving and loading an already-quantized network?
For example, is using Python’s pickle mechanism going to do the work? Can I quantize a network on Linux, save it using Pickle and reload it on a Windows machine? Is it a recommended approach?