In the ideal PyTorch workflow from training to production deployment, where should one freeze the model? In particular, assume you are training a model that you compile to TorchScript and want to keep somewhere to use for a while into the future.
torch.jit.freeze before saving the trained model, and always use the frozen saved model? Or should I save a normal TorchScript model, and
torch::jit::freeze it when I load it in C++ for inference, re-freezing and optimizing it every I start an inference process? (Start up time is of no concern for my application, but backward compatibility is.)
Frozen models are clearly less flexible than unfrozen ones (`torch.jit.freeze`'d models cannot be moved to GPU with `.to()` · Issue #57569 · pytorch/pytorch · GitHub), but it is unclear to me whether:
- The optimizations they apply are system dependent. (Will I get better performance by freezing on the target system?)
- Frozen models are still expected to be fairly future-proof, or if they are more specific to the setup on which they were frozen.