After digging a bit, i find this process best for finally compressing my models and getting them into a browser:
- Train using float32 type
- Export the model to onnx (or to pth and later on to onnx.)
- Convert using onnx converter from onnx to ort format (reduces the size to half.)
I wonder if others are doing the same, or following different strategies.