RuntimeError: quantized::conv(FBGEMM): Expected activation data type QUInt8 but got QInt8

Hello,
I was trying to quantize a simple model with qint8 for both activations and weights, in a qconfig(2) way, because what I want to do is quantize->convert to onnx->deploy on tensorrt.

I replace
(1)float_model.qconfig = torch.quantization.get_default_qconfig(‘fbgemm’)
with
(2)QConfig(activation=HistogramObserver.with_args(dtype=torch.qint8, qscheme=torch.per_tensor_symmetric), weight=default_per_channel_weight_observer)

my model looks like:
class LeNet(nn.Module):
    def __init__(self):
        super(LeNet, self).__init__()
        self.quant = torch.quantization.QuantStub()
        self.conv1 = nn.Conv2d(3, 16, 5)
        self.pool1 = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(16, 32, 5)
        self.pool2 = nn.MaxPool2d(2, 2)
        self.fc1 = nn.Linear(32*5*5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)
        self.relu1 = nn.ReLU()
        self.relu2 = nn.ReLU()
        self.relu3 = nn.ReLU()
        self.relu4 = nn.ReLU()
        self.flatten = nn.Flatten()
        self.dequant = torch.quantization.DeQuantStub()

    def forward(self, x):            # input(3, 32, 32)
        x = self.quant(x)
        x = self.conv1(x)
        x = self.relu1(x)    #output(16, 28, 28)
        x = self.pool1(x)            # output(16, 14, 14)
        x = self.conv2(x)
        x = self.relu2(x)    # output(32, 10, 10)
        x = self.pool2(x)            # output(32, 5, 5)
#        x = x.view(-1, 32*5*5)       # output(32*5*5)
#        x = torch.reshape(x, (16, 32*5*5))
        x = self.flatten(x)
        x = self.fc1(x)
        x = self.relu3(x)      # output(120)
        x = self.fc2(x)
        x = self.relu4(x)      # output(84)
        x = self.fc3(x)              # output(10)
        x = self.dequant(x)
        return x

However, if I use (1)'s qconfig which is quint8, tensorrt will raise an error that it dosen’t support uint8

If I want to quantize on torch and deploy on tensorrt, what should I do? Could you tell me, please?
Thanks!

Same problem here. I now use pytorch-quantization instead. Waiting for solution.

It seems it is possible: TensorRT/test_quant_trt.py at main · pytorch/TensorRT · GitHub
However, it is Immature at this moment

1 Like

GPU inference is also one of our FAQs: Quantization — PyTorch 2.0 documentation