ONNX export of quantized model

supriyar · April 20, 2020, 5:13pm

Does it mean that we can convert to onnx scripted parts of network (using torch.jit.script ) ?

I haven’t tried torch.jit.script for quantized pytorch network to onnx to Caffe2. But torch.jit.trace should work.

What if our network contains of operators that aren’t available in TorchScript but available in Caffe2 (e.g. RoIAlign)?

At this point this is only limited to operators present in both quantized Pytorch and quantized Caffe2 framework.

Optionally is it possible to use quantized layers with TorchScript backend on mobile (I mean without additional conversion to Caffe2 using ONNX)?

You can directly run quantized pytorch network on mobile using PyTorch Mobile which is highly recommended over converting to Caffe2. Check out https://pytorch.org/mobile/home/.

LMerCy · July 21, 2020, 10:36am

@supriyar
Dose it now support converting quantized model to ONNX in dev-version or stable version?

supriyar · July 21, 2020, 11:42pm

General export of quantized models to ONNX isn’t currently supported. We only support conversion to ONNX for Caffe2 backend. This thread has additional context on what we currently support - ONNX export of quantized model

RicCu · July 22, 2020, 1:03am

Is generic onnx export support for quantized models (eg for import with onnx runtime) on the roadmap?

srohit0 · November 2, 2020, 6:14pm

@supriyar this workaround fails too in JIT while calling torch.onnx.export.

Bug Filed at INTERNAL ASSERT FAILED at "/pytorch/torch/csrc/jit/passes/onnx/unpack_quantized_weights.cpp":92 · Issue #47204 · pytorch/pytorch · GitHub

mhamdan · November 25, 2020, 2:30am

Experiencing the same issue. However if qconfig is set to qnnpack ( model.qconfig = torch.quantization.get_default_qconfig(‘qnnpack’)), this error goes away, but another issue pop up.

I am getting the following error for same code except qconfig set to qnnpack. Is there a fix for this? Any way to export quantized pytorch model to ONNX?

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-54-d1ee04c303f8> in <module>()
     24                           example_outputs=outputs,
     25                           # opset_version=10,
---> 26                           operator_export_type=torch.onnx.OperatorExportTypes.ONNX_ATEN_FALLBACK)
     27         # f.seek(0)
     28         onnx_model = onnx.load(f)

C:\Users\mhamdan\AppData\Roaming\Python\Python37\site-packages\torch\onnx\__init__.py in export(model, args, f, export_params, verbose, training, input_names, output_names, aten, export_raw_ir, operator_export_type, opset_version, _retain_param_name, do_constant_folding, example_outputs, strip_doc_string, dynamic_axes, keep_initializers_as_inputs, custom_opsets, enable_onnx_checker, use_external_data_format)
    228                         do_constant_folding, example_outputs,
    229                         strip_doc_string, dynamic_axes, keep_initializers_as_inputs,
--> 230                         custom_opsets, enable_onnx_checker, use_external_data_format)
    231 
    232 

C:\Users\mhamdan\AppData\Roaming\Python\Python37\site-packages\torch\onnx\utils.py in export(model, args, f, export_params, verbose, training, input_names, output_names, aten, export_raw_ir, operator_export_type, opset_version, _retain_param_name, do_constant_folding, example_outputs, strip_doc_string, dynamic_axes, keep_initializers_as_inputs, custom_opsets, enable_onnx_checker, use_external_data_format)
     89             dynamic_axes=dynamic_axes, keep_initializers_as_inputs=keep_initializers_as_inputs,
     90             custom_opsets=custom_opsets, enable_onnx_checker=enable_onnx_checker,
---> 91             use_external_data_format=use_external_data_format)
     92 
     93 

C:\Users\mhamdan\AppData\Roaming\Python\Python37\site-packages\torch\onnx\utils.py in _export(model, args, f, export_params, verbose, training, input_names, output_names, operator_export_type, export_type, example_outputs, opset_version, _retain_param_name, do_constant_folding, strip_doc_string, dynamic_axes, keep_initializers_as_inputs, fixed_batch_size, custom_opsets, add_node_names, enable_onnx_checker, use_external_data_format, onnx_shape_inference, use_new_jit_passes)
    637                                 training=training,
    638                                 use_new_jit_passes=use_new_jit_passes,
--> 639                                 dynamic_axes=dynamic_axes)
    640 
    641             # TODO: Don't allocate a in-memory string for the protobuf

C:\Users\mhamdan\AppData\Roaming\Python\Python37\site-packages\torch\onnx\utils.py in _model_to_graph(model, args, verbose, input_names, output_names, operator_export_type, example_outputs, _retain_param_name, do_constant_folding, _disable_torch_constant_prop, fixed_batch_size, training, use_new_jit_passes, dynamic_axes)
    419                             fixed_batch_size=fixed_batch_size, params_dict=params_dict,
    420                             use_new_jit_passes=use_new_jit_passes,
--> 421                             dynamic_axes=dynamic_axes, input_names=input_names)
    422     from torch.onnx.symbolic_helper import _onnx_shape_inference
    423     if isinstance(model, torch.jit.ScriptModule) or isinstance(model, torch.jit.ScriptFunction):

C:\Users\mhamdan\AppData\Roaming\Python\Python37\site-packages\torch\onnx\utils.py in _optimize_graph(graph, operator_export_type, _disable_torch_constant_prop, fixed_batch_size, params_dict, use_new_jit_passes, dynamic_axes, input_names)
    180             torch.onnx.symbolic_helper._quantized_ops.clear()
    181             # Unpack quantized weights for conv and linear ops and insert into graph.
--> 182             torch._C._jit_pass_onnx_unpack_quantized_weights(graph, params_dict)
    183             # Insert permutes before and after each conv op to ensure correct order.
    184             torch._C._jit_pass_onnx_quantization_insert_permutes(graph, params_dict)

RuntimeError: bad optional access

amrmartini · May 7, 2021, 5:03pm

Any update on this? Is it still not possible to export to ONNX backend at all?

Vasiliy_Kuznetsov · May 10, 2021, 11:48pm

hi @amrmartini , we don’t have an update on this issue at the moment. We are not currently actively improving the ONNX export path for quantized models.

Vasiliy_Kuznetsov · May 10, 2021, 11:48pm

We would welcome external contributions in this area!

neginraoof · June 14, 2021, 4:30pm

The exporter does support pytorch QAT models right now. You should be able to export this model without “operator_export_type=OperatorExportTypes.ONNX_ATEN_FALLBACK,”.
The default export type should work.
Please let me know if you’re facing any issues.

addisonklinke · June 16, 2021, 9:50pm

@neginraoof Can you post a small example showing how to export a quantized model? Should it work with static quantization as well as QAT? I’m on 1.8.1 and tried @G4V’s example from here, but still get the following error even with ONNX_ATEN_FALLBACK

AttributeError: 'torch.dtype' object has no attribute 'detach'

ZyrianovS · July 7, 2021, 7:09am

@neginraoof @addisonklinke
In my case torch.quantization.convert creates additional bias with None value for some layers. Though there is no bias there in the full model.

Then during torch.onnx.export torch.jit._unique_state_dict complains about detach() on NoneType as it expects Tensor there.

torch.__version__
1.9.0+cu111

Below is the code to quickly reproduce that:

import torch
import torch.nn as nn
import onnx
import onnxruntime as ort
import numpy as np
import copy

from torch.quantization import QuantStub, DeQuantStub

def print_model(model, tag):

    print(tag.upper(), 'MODEL')
    print(model)
    for item in model.state_dict().items():
        try:
            print(item[0], item[1].shape)
        except:
            print(item[0], item[1])

def check_onnx_export(model, x, tag):

    model.eval()
    print('\nEXPORTING', tag.upper(), 'TO ONNX')
    path = 'tmp/test-{}.onnx'.format(tag)
    torch_output = model(x).detach()
    torch.onnx.export(model, x, path, verbose=True)
    # torch.onnx.export(model, x, path, verbose=True, operator_export_type=torch.onnx.OperatorExportTypes.ONNX_ATEN_FALLBACK)
    print('CHECKING')
    model = onnx.load(path)
    onnx.checker.check_model(model)
    ort_session = ort.InferenceSession(path)
    ort_outputs = ort_session.run(None, {'input.1': np.array(x).astype(np.float32)})
    print(torch_output.shape, ort_outputs[0].shape)
    np.testing.assert_allclose(np.array(torch_output), ort_outputs[0], rtol=1e-03, atol=1e-05)
    print('FINISH')

def fuse(model):

    model_fused = copy.deepcopy(model)
    for m in model_fused.modules():
        if type(m) is Conv:
            torch.quantization.fuse_modules(m, ['conv', 'bn'], inplace=True)
    return model_fused


class Conv(nn.Module):

    def __init__(self):    
        super(Conv, self).__init__()
        self.conv = nn.Conv2d(3, 3, 1, bias=False)
        self.bn = nn.BatchNorm2d(3)
        self.quant = QuantStub()

    def forward(self, x):
        x = self.quant(x)
        x = self.conv(x)
        x = self.bn(x)
        return x

class Model(nn.Module):

    def __init__(self):
        
        super(Model, self).__init__()
        self.cv1 = Conv()
        self.cv2 = nn.Conv2d(3, 3, 1, bias=False)
        self.dequant = DeQuantStub()        

    def forward(self, x):
        x = self.cv1(x)
        x = self.cv2(x)
        x = self.dequant(x)
        return x 


x = torch.rand(3,3,32,32)
model = Model()
model.qconfig = torch.quantization.get_default_qat_qconfig('qnnpack')
model_fused = fuse(model)
model_qat = torch.quantization.prepare_qat(model_fused)
model_qat.eval()
model_int8 = torch.quantization.convert(model_qat)

print_model(model, 'full')
print_model(model_int8, 'int8')
check_onnx_export(model_int8, x, 'int8')

ZyrianovS · July 9, 2021, 2:18am

UPD: Those additional biases set to None are becasue torch.backends.quantized.engine mismath with torch.quantization.get_default_qat_qconfig. When they match though (set to qnnpack in my case) I get
RuntimeError: Tried to trace <__torch__.torch.classes.quantized.Conv2dPackedParamsBase object at 0x52982b0> but it is not part of the active trace. Modules that are called during a trace must be registered as submodules of the thing being traced.

ZyrianovS · July 13, 2021, 8:17am

UPD: If I script it with torch.jit.script(model_int8) before torch.onnx.export I get another error:
RuntimeError: Exporting the operator quantize_per_tensor to ONNX opset version 9 is not supported. Please feel free to request support or submit a pull request on PyTorch GitHub.

jinhui_he · September 30, 2021, 8:22am

Have you solved the problem?

seungjun · September 30, 2021, 9:52am

As far as I know, not all quantized models can be exported, currently.

The models quantized by pytorch-quantization can be exported to ONNX form, assuming execution by TensorRT engine.

github link: TensorRT/tools/pytorch-quantization at master · NVIDIA/TensorRT · GitHub

jinfagang · April 13, 2022, 7:00am

I hit same issue, the model I can quantize and calib using torch.fx

But I can not migrate the quantized model to other inference engine via ONNX.

IMO, the weights at least should work on same platform.

Amur_Saqib_Pal · April 15, 2022, 4:27am

I would recommend a PyTorch library called Brevitas for quantization. It also has support for a novel ONNX variant called QONNX.

jinfagang · April 18, 2022, 7:43am

Same issue here, I can see any clue that @neginraoof claimed support.

jinfagang · April 18, 2022, 8:06am

thanks for your advise. Does brevitas able to convert fx quantized torch model to onnx?

I saw it was using some QConv to construct model and then convert. I don’t want that.