ONNX export of quantized model

neginraoof · June 14, 2021, 4:30pm

The exporter does support pytorch QAT models right now. You should be able to export this model without “operator_export_type=OperatorExportTypes.ONNX_ATEN_FALLBACK,”.
The default export type should work.
Please let me know if you’re facing any issues.

addisonklinke · June 16, 2021, 9:50pm

@neginraoof Can you post a small example showing how to export a quantized model? Should it work with static quantization as well as QAT? I’m on 1.8.1 and tried @G4V’s example from here, but still get the following error even with ONNX_ATEN_FALLBACK

AttributeError: 'torch.dtype' object has no attribute 'detach'

ZyrianovS · July 7, 2021, 7:09am

@neginraoof @addisonklinke
In my case torch.quantization.convert creates additional bias with None value for some layers. Though there is no bias there in the full model.

Then during torch.onnx.export torch.jit._unique_state_dict complains about detach() on NoneType as it expects Tensor there.

torch.__version__
1.9.0+cu111

Below is the code to quickly reproduce that:

import torch
import torch.nn as nn
import onnx
import onnxruntime as ort
import numpy as np
import copy

from torch.quantization import QuantStub, DeQuantStub

def print_model(model, tag):

    print(tag.upper(), 'MODEL')
    print(model)
    for item in model.state_dict().items():
        try:
            print(item[0], item[1].shape)
        except:
            print(item[0], item[1])

def check_onnx_export(model, x, tag):

    model.eval()
    print('\nEXPORTING', tag.upper(), 'TO ONNX')
    path = 'tmp/test-{}.onnx'.format(tag)
    torch_output = model(x).detach()
    torch.onnx.export(model, x, path, verbose=True)
    # torch.onnx.export(model, x, path, verbose=True, operator_export_type=torch.onnx.OperatorExportTypes.ONNX_ATEN_FALLBACK)
    print('CHECKING')
    model = onnx.load(path)
    onnx.checker.check_model(model)
    ort_session = ort.InferenceSession(path)
    ort_outputs = ort_session.run(None, {'input.1': np.array(x).astype(np.float32)})
    print(torch_output.shape, ort_outputs[0].shape)
    np.testing.assert_allclose(np.array(torch_output), ort_outputs[0], rtol=1e-03, atol=1e-05)
    print('FINISH')

def fuse(model):

    model_fused = copy.deepcopy(model)
    for m in model_fused.modules():
        if type(m) is Conv:
            torch.quantization.fuse_modules(m, ['conv', 'bn'], inplace=True)
    return model_fused


class Conv(nn.Module):

    def __init__(self):    
        super(Conv, self).__init__()
        self.conv = nn.Conv2d(3, 3, 1, bias=False)
        self.bn = nn.BatchNorm2d(3)
        self.quant = QuantStub()

    def forward(self, x):
        x = self.quant(x)
        x = self.conv(x)
        x = self.bn(x)
        return x

class Model(nn.Module):

    def __init__(self):
        
        super(Model, self).__init__()
        self.cv1 = Conv()
        self.cv2 = nn.Conv2d(3, 3, 1, bias=False)
        self.dequant = DeQuantStub()        

    def forward(self, x):
        x = self.cv1(x)
        x = self.cv2(x)
        x = self.dequant(x)
        return x 


x = torch.rand(3,3,32,32)
model = Model()
model.qconfig = torch.quantization.get_default_qat_qconfig('qnnpack')
model_fused = fuse(model)
model_qat = torch.quantization.prepare_qat(model_fused)
model_qat.eval()
model_int8 = torch.quantization.convert(model_qat)

print_model(model, 'full')
print_model(model_int8, 'int8')
check_onnx_export(model_int8, x, 'int8')

ZyrianovS · July 9, 2021, 2:18am

UPD: Those additional biases set to None are becasue torch.backends.quantized.engine mismath with torch.quantization.get_default_qat_qconfig. When they match though (set to qnnpack in my case) I get
RuntimeError: Tried to trace <__torch__.torch.classes.quantized.Conv2dPackedParamsBase object at 0x52982b0> but it is not part of the active trace. Modules that are called during a trace must be registered as submodules of the thing being traced.

ZyrianovS · July 13, 2021, 8:17am

UPD: If I script it with torch.jit.script(model_int8) before torch.onnx.export I get another error:
RuntimeError: Exporting the operator quantize_per_tensor to ONNX opset version 9 is not supported. Please feel free to request support or submit a pull request on PyTorch GitHub.

jinhui_he · September 30, 2021, 8:22am

Have you solved the problem?

seungjun · September 30, 2021, 9:52am

As far as I know, not all quantized models can be exported, currently.

The models quantized by pytorch-quantization can be exported to ONNX form, assuming execution by TensorRT engine.

github link: TensorRT/tools/pytorch-quantization at master · NVIDIA/TensorRT · GitHub

jinfagang · April 13, 2022, 7:00am

I hit same issue, the model I can quantize and calib using torch.fx

But I can not migrate the quantized model to other inference engine via ONNX.

IMO, the weights at least should work on same platform.

Amur_Saqib_Pal · April 15, 2022, 4:27am

I would recommend a PyTorch library called Brevitas for quantization. It also has support for a novel ONNX variant called QONNX.

jinfagang · April 18, 2022, 7:43am

Same issue here, I can see any clue that @neginraoof claimed support.

jinfagang · April 18, 2022, 8:06am

thanks for your advise. Does brevitas able to convert fx quantized torch model to onnx?

I saw it was using some QConv to construct model and then convert. I don’t want that.

tsaiHY · August 23, 2022, 9:09am

Hi, I try to export a model quantized by pytorch-quantization, but I get this error:RuntimeError: ONNX export failed: Couldn’t export Python operator FakeTensorQuantFunction

Ahmed_Louati · June 9, 2023, 8:56pm

Hello,
Any news on this?
Can we export ONNX models based on quantization-aware training?

HDCharles · June 12, 2023, 4:15pm

Hey, in general our team is not working on ONNX support,l there’s unlikely to be any new movement in that direction given that pytorch2.0 is intended to fill a simmilar gap.

Ahmed_Louati · June 12, 2023, 7:40pm

So, Are you working on other ways to export quantized models that will be deployed on Edge devices/ Embedded systems?

Thanks,

jerryzh168 · June 13, 2023, 5:53pm

yeah we are working on the next version of quantization on top of pytorch 2.0, that will work with executorch, which is a new stack for on device inference, it might be announced soon, please stay tuned for next PTDC or pytorch releases

TingfengTang · July 28, 2023, 5:56am

what is executorch,I only find little information about that. Is there some official documents about that?

jerryzh168 · July 28, 2023, 5:38pm

it’s private right now, we’ll have a MVP version release in PyTorch Developer Conference this year in October: PyTorch Conference 2023: Join us in San Francisco October 16-17 | PyTorch

Yue_Wang1 · December 7, 2023, 7:59pm

Any news on this: quantize a pytorch model to int8 and then convert it to tensorrt engine?

HDCharles · December 7, 2023, 10:02pm

its not currently our priority, we need someone to implement quantizer for tensorrt: How to Write a Quantizer for PyTorch 2 Export Quantization — PyTorch Tutorials 2.1.1+cu121 documentation we welcome any contributions