How to convert a QAT model to ONNX model

Hi, I want to kown does QAT model in pytorch can convert to ONNX model?

I have tried FX model quantization and Pytorch 2 export quantization, and I can running quantization aware training both of them on YOLOV5s, i want to export to onnx model to accelerate inference in chip. But I tried torch.onnx.export and torch.onnx.dynamo_export, raise error like this:

from user code:
   File "<eval_with_key>.5", line 7, in forward
    quantize_per_tensor = torch.quantize_per_tensor(x, model_0_conv_input_scale_0, model_0_conv_input_zero_point_0, torch.quint8);  x = model_0_conv_input_scale_0 = model_0_conv_input_zero_point_0 = None

Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information


The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/root/yolov5/train.py", line 1021, in <module>
    main(opt)
  File "/root/yolov5/train.py", line 723, in main
    train(opt.hyp, opt, device, callbacks)
  File "/root/yolov5/train.py", line 475, in train
    torch.onnx.dynamo_export(
  File "/opt/conda/lib/python3.10/site-packages/torch/onnx/__init__.py", line 517, in dynamo_export
    return dynamo_export(
  File "/opt/conda/lib/python3.10/site-packages/torch/onnx/_internal/_exporter_legacy.py", line 1233, in dynamo_export
    raise errors.OnnxExporterError(message) from e
torch.onnx.OnnxExporterError: Failed to export the model to ONNX. Generating SARIF report at 'report_dynamo_export.sarif'. SARIF is a standard format for the output of static analysis tools. SARIF logs can be loaded in VS Code SARIF viewer extension, or SARIF web viewer (https://microsoft.github.io/sarif-web-component/). Please report a bug on PyTorch Github: https://github.com/pytorch/pytorch/issues

So i want to know is there some documentation to guide export QAT model to ONNX model?

Hello!
Facing same issue here. Help is appreciated.

Best Regards,
Christin
My ACI

Hi, I have tried export FX mode QAT model to quantized model before export, my code as follow
as:

            model.eval()
            model_quantized = quantize_fx.convert_fx(model)

            torch.onnx.dynamo_export(
                model_quantized.cpu(),  # --dynamic only compatible with cpu
                (torch.rand(1, 3, 640, 640)),
                "/root/yolov5/qat.onnx",
            )

but it’s doesn’t work, error as follows:

Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/torch/_subclasses/fake_tensor.py", line 2013, in _dispatch_impl
    r = func(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/torch/_ops.py", line 716, in __call__
    return self._op(*args, **kwargs)
NotImplementedError: aten::quantize_per_tensor.tensor_qparams: attempted to run this operator with Meta tensors, but there was no fake impl or Meta kernel registered. You may have run into this message while using an operator with PT2 compilation APIs (torch.compile/torch.export); in order to use this operator with those APIs you'll need to add a fake impl. Please see the following for next steps:  https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html

During handling of the above exception, another exception occurred: 

File "/opt/conda/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 2148, in run_node
    unimplemented(make_error_message(e), from_exc=e)
  File "/opt/conda/lib/python3.10/site-packages/torch/_dynamo/exc.py", line 296, in unimplemented
    raise Unsupported(msg, case_name=case_name) from from_exc
torch._dynamo.exc.Unsupported: Failed running call_function <built-in method quantize_per_tensor of type object at 0x7f0647a0d1c0>(*(FakeTensor(..., size=(1, 3, 640, 640)), FakeTensor(..., size=()), FakeTensor(..., size=(), dtype=torch.int64), torch.quint8), **{}):
quantized nyi in meta tensors

from user code:
   File "<eval_with_key>.5", line 7, in forward
    quantize_per_tensor = torch.quantize_per_tensor(x, model_0_conv_input_scale_0, model_0_conv_input_zero_point_0, torch.quint8);  x = model_0_conv_input_scale_0 = model_0_conv_input_zero_point_0 = None

Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information


The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/root/yolov5/train.py", line 1016, in <module>
    main(opt)
  File "/root/yolov5/train.py", line 718, in main
    train(opt.hyp, opt, device, callbacks)
  File "/root/yolov5/train.py", line 470, in train
    torch.onnx.dynamo_export(
  File "/opt/conda/lib/python3.10/site-packages/torch/onnx/__init__.py", line 517, in dynamo_export
    return dynamo_export(
  File "/opt/conda/lib/python3.10/site-packages/torch/onnx/_internal/_exporter_legacy.py", line 1233, in dynamo_export
    raise errors.OnnxExporterError(message) from e
torch.onnx.OnnxExporterError: Failed to export the model to ONNX. Generating SARIF report at 'report_dynamo_export.sarif'. SARIF is a standard format for the output of static analysis tools. SARIF logs can be loaded in VS Code SARIF viewer extension, or SARIF web viewer (https://microsoft.github.io/sarif-web-component/). Please report a bug on PyTorch Github: https://github.com/pytorch/pytorch/issues