In the ideal PyTorch workflow from training to production deployment, where should one freeze the model? In particular, assume you are training a model that you compile to TorchScript and want to keep somewhere to use for a while into the future.
Should I torch.jit.freeze before saving the trained model, and always use the frozen saved model? Or should I save a normal TorchScript model, and torch::jit::freeze it when I load it in C++ for inference, re-freezing and optimizing it every I start an inference process? (Start up time is of no concern for my application, but backward compatibility is.)
Freezing optimizations should be system-independent. We introduced optimize_for_inference for system-dependent optimizations (does CUDNN exist and does it’s version work correctly with Conv-Add-Relu fusion, is MKLDNN installed). That one is still a little nascent - for now I’d just recommend using it with vision models.
Yes, frozen models are expected to be future-proof. Optimize for inference is not because it bakes in things wrt/the system so saving and loading it isn’t recommended / an intended use case.
Thanks for the quick answer— this is great, was not aware before of optimize_for_inference. Will keep an eye on it as it becomes relevant to a broader class of models.
To flip the question around, are TorchScript models that are not frozen just as well qualified for being archived for later use? (If I want to save them unfrozen and then freeze them “on-demand” before I use them.)