ONNX export of quantized model

The exporter does support pytorch QAT models right now. You should be able to export this model without “operator_export_type=OperatorExportTypes.ONNX_ATEN_FALLBACK,”.
The default export type should work.
Please let me know if you’re facing any issues.

@neginraoof Can you post a small example showing how to export a quantized model? Should it work with static quantization as well as QAT? I’m on 1.8.1 and tried @G4V’s example from here, but still get the following error even with ONNX_ATEN_FALLBACK

AttributeError: 'torch.dtype' object has no attribute 'detach'
4 Likes

@neginraoof @addisonklinke
In my case torch.quantization.convert creates additional bias with None value for some layers. Though there is no bias there in the full model.

Then during torch.onnx.export torch.jit._unique_state_dict complains about detach() on NoneType as it expects Tensor there.

torch.__version__
1.9.0+cu111

Below is the code to quickly reproduce that:

import torch
import torch.nn as nn
import onnx
import onnxruntime as ort
import numpy as np
import copy

from torch.quantization import QuantStub, DeQuantStub

def print_model(model, tag):

    print(tag.upper(), 'MODEL')
    print(model)
    for item in model.state_dict().items():
        try:
            print(item[0], item[1].shape)
        except:
            print(item[0], item[1])

def check_onnx_export(model, x, tag):

    model.eval()
    print('\nEXPORTING', tag.upper(), 'TO ONNX')
    path = 'tmp/test-{}.onnx'.format(tag)
    torch_output = model(x).detach()
    torch.onnx.export(model, x, path, verbose=True)
    # torch.onnx.export(model, x, path, verbose=True, operator_export_type=torch.onnx.OperatorExportTypes.ONNX_ATEN_FALLBACK)
    print('CHECKING')
    model = onnx.load(path)
    onnx.checker.check_model(model)
    ort_session = ort.InferenceSession(path)
    ort_outputs = ort_session.run(None, {'input.1': np.array(x).astype(np.float32)})
    print(torch_output.shape, ort_outputs[0].shape)
    np.testing.assert_allclose(np.array(torch_output), ort_outputs[0], rtol=1e-03, atol=1e-05)
    print('FINISH')

def fuse(model):

    model_fused = copy.deepcopy(model)
    for m in model_fused.modules():
        if type(m) is Conv:
            torch.quantization.fuse_modules(m, ['conv', 'bn'], inplace=True)
    return model_fused


class Conv(nn.Module):

    def __init__(self):    
        super(Conv, self).__init__()
        self.conv = nn.Conv2d(3, 3, 1, bias=False)
        self.bn = nn.BatchNorm2d(3)
        self.quant = QuantStub()

    def forward(self, x):
        x = self.quant(x)
        x = self.conv(x)
        x = self.bn(x)
        return x

class Model(nn.Module):

    def __init__(self):
        
        super(Model, self).__init__()
        self.cv1 = Conv()
        self.cv2 = nn.Conv2d(3, 3, 1, bias=False)
        self.dequant = DeQuantStub()        

    def forward(self, x):
        x = self.cv1(x)
        x = self.cv2(x)
        x = self.dequant(x)
        return x 


x = torch.rand(3,3,32,32)
model = Model()
model.qconfig = torch.quantization.get_default_qat_qconfig('qnnpack')
model_fused = fuse(model)
model_qat = torch.quantization.prepare_qat(model_fused)
model_qat.eval()
model_int8 = torch.quantization.convert(model_qat)

print_model(model, 'full')
print_model(model_int8, 'int8')
check_onnx_export(model_int8, x, 'int8')
1 Like

UPD: Those additional biases set to None are becasue torch.backends.quantized.engine mismath with torch.quantization.get_default_qat_qconfig. When they match though (set to qnnpack in my case) I get
RuntimeError: Tried to trace <__torch__.torch.classes.quantized.Conv2dPackedParamsBase object at 0x52982b0> but it is not part of the active trace. Modules that are called during a trace must be registered as submodules of the thing being traced.

1 Like

UPD: If I script it with torch.jit.script(model_int8) before torch.onnx.export I get another error:
RuntimeError: Exporting the operator quantize_per_tensor to ONNX opset version 9 is not supported. Please feel free to request support or submit a pull request on PyTorch GitHub.

1 Like

Have you solved the problem?

As far as I know, not all quantized models can be exported, currently.

The models quantized by pytorch-quantization can be exported to ONNX form, assuming execution by TensorRT engine.

github link: TensorRT/tools/pytorch-quantization at master · NVIDIA/TensorRT · GitHub

I hit same issue, the model I can quantize and calib using torch.fx

But I can not migrate the quantized model to other inference engine via ONNX.

IMO, the weights at least should work on same platform.

I would recommend a PyTorch library called Brevitas for quantization. It also has support for a novel ONNX variant called QONNX.

Same issue here, I can see any clue that @neginraoof claimed support.

thanks for your advise. Does brevitas able to convert fx quantized torch model to onnx?

I saw it was using some QConv to construct model and then convert. I don’t want that.

Hi, I try to export a model quantized by pytorch-quantization, but I get this error:RuntimeError: ONNX export failed: Couldn’t export Python operator FakeTensorQuantFunction

1 Like

Hello,
Any news on this?
Can we export ONNX models based on quantization-aware training?

Hey, in general our team is not working on ONNX support,l there’s unlikely to be any new movement in that direction given that pytorch2.0 is intended to fill a simmilar gap.

So, Are you working on other ways to export quantized models that will be deployed on Edge devices/ Embedded systems?

Thanks,

yeah we are working on the next version of quantization on top of pytorch 2.0, that will work with executorch, which is a new stack for on device inference, it might be announced soon, please stay tuned for next PTDC or pytorch releases

what is executorch,I only find little information about that. Is there some official documents about that?

it’s private right now, we’ll have a MVP version release in PyTorch Developer Conference this year in October: PyTorch Conference 2023: Join us in San Francisco October 16-17 | PyTorch

Any news on this: quantize a pytorch model to int8 and then convert it to tensorrt engine?

its not currently our priority, we need someone to implement quantizer for tensorrt: How to Write a Quantizer for PyTorch 2 Export Quantization — PyTorch Tutorials 2.1.1+cu121 documentation we welcome any contributions

2 Likes