I am trying to experiment low-precision training - INT8. Is there support for this in pytorch?. Also what quantization methods are supported?
Training with int8 - no chance due to numerical stability limitations I guess, but inference on int8 is very interesting.
however when I use INT8 to compute, it doesn’t faster even slower, i wonder why
Pytorch does not support efficient INT8 scoring, and if you do not have Volta you will not gain any speed gain on both train and score on fp16. If you want fast scoring in int8 consider using of TensorRT you will get up to 3x faster scoring on ResNet like nets on INT8 with “slightly” lower accuracy
I am not aware of any native 8 bit or lower training, or for that matter, inference, as compared with something like tflite, which only supports it in specific instances. Partially I assume this has to do with the fact that there is no canonical method for quantization. There are a number of implementation examples though, see, e.g., https://github.com/eladhoffer/quantized.pytorch, or Glow.
If you look at the github issues or PRs or even the git tree’s test directory, you’ll find there is good progress towards a comprehensive solution of the various quantisation strategies.
I have tried to transform my pytorch model into ONNX model, and transform it into TensorRT model, but I met an unexpected error(using yolov3.onnx downloaded from offical web), it said “ERROR: Network must have at least one outout”, have you ever met this problem?