Hello,
I was trying to quantize a simple model with qint8 for both activations and weights, in a qconfig(2) way, because what I want to do is quantize->convert to onnx->deploy on tensorrt.
I replace
(1)float_model.qconfig = torch.quantization.get_default_qconfig(‘fbgemm’)
with
(2)QConfig(activation=HistogramObserver.with_args(dtype=torch.qint8, qscheme=torch.per_tensor_symmetric), weight=default_per_channel_weight_observer)
my model looks like:
class LeNet(nn.Module):
def __init__(self):
super(LeNet, self).__init__()
self.quant = torch.quantization.QuantStub()
self.conv1 = nn.Conv2d(3, 16, 5)
self.pool1 = nn.MaxPool2d(2, 2)
self.conv2 = nn.Conv2d(16, 32, 5)
self.pool2 = nn.MaxPool2d(2, 2)
self.fc1 = nn.Linear(32*5*5, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)
self.relu1 = nn.ReLU()
self.relu2 = nn.ReLU()
self.relu3 = nn.ReLU()
self.relu4 = nn.ReLU()
self.flatten = nn.Flatten()
self.dequant = torch.quantization.DeQuantStub()
def forward(self, x): # input(3, 32, 32)
x = self.quant(x)
x = self.conv1(x)
x = self.relu1(x) #output(16, 28, 28)
x = self.pool1(x) # output(16, 14, 14)
x = self.conv2(x)
x = self.relu2(x) # output(32, 10, 10)
x = self.pool2(x) # output(32, 5, 5)
# x = x.view(-1, 32*5*5) # output(32*5*5)
# x = torch.reshape(x, (16, 32*5*5))
x = self.flatten(x)
x = self.fc1(x)
x = self.relu3(x) # output(120)
x = self.fc2(x)
x = self.relu4(x) # output(84)
x = self.fc3(x) # output(10)
x = self.dequant(x)
return x
However, if I use (1)'s qconfig which is quint8, tensorrt will raise an error that it dosen’t support uint8
If I want to quantize on torch and deploy on tensorrt, what should I do? Could you tell me, please?
Thanks!