Sorry for possibly stupid question. But my question emerges naturally due to lack (in my opinion) of complete step-by-step documentation for release transition process.
To the best of my knowledge the only way to effectively (using GPU + quantization to int8 + some compiler optimizations) execute PyTorch models on host devices is:
- Make JIT code from PyTorch models
- Save this model in ONNX format on disk
- Make an application or use existing to load and run pytorch model in ONNX format saved previously
My question is how much code will be autonomous? What libraries and other environment features will it require to run on host x86 server with GPU on the customer side?