I want to quantize a model so that I can pass int8 values directly into the model post quantization. However, the tutorials all seem to assume that I still pass fp32 which is then converted using QuantStub, so I am not really sure where to look for a better implementation
My code looks like this:
class ConvModel(nn.Module): def __init__(self, in_channels, out_channels, kernel_size, num_classes=0): super().__init__() self.c1 = nn.Conv2d(in_channels, out_channels, kernel_size) def forward(self, x): x = self.c1(x) return x # Quantize the model input_fp = torch.rand(1, input_height, input_width, input_channels) model = ConvModel(input_channels, output_channels, kernel_size) model.eval() # Specify quantization configuration # Start with simple min/max range estimation and per-tensor quantization of weights model.qconfig = torch.ao.quantization.default_qconfig torch.ao.quantization.prepare(model, inplace=True) # Pseudo-Calibration model(input_fp) #Convert to quantized model torch.ao.quantization.convert(model, inplace=True) # Save the model. torch.jit.save(torch.jit.script(model), "conv2d_model_scripted_quantized.pth") #Generate expected output data input_matrix = torch.randint(0, 128, (1, input_channels, input_height, input_width), dtype=torch.int8) expected_output = model(input_matrix)
With this, I get the following error message:
NotImplementedError: Could not run 'quantized::conv2d.new' with arguments from the 'CPU' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'quantized::conv2d.new' is only available for these backends: [QuantizedCPU, QuantizedCUDA, BackendSelect, Python, FuncTorchDynamicLayerBackMode, Functionalize, Named, Conjugate, Negative, ZeroTensor, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradMPS, AutogradXPU, AutogradHPU, AutogradLazy, AutogradMeta, Tracer, AutocastCPU, AutocastCUDA, FuncTorchBatched, FuncTorchVmapMode, Batched, VmapMode, FuncTorchGradWrapper, PythonTLSSnapshot, FuncTorchDynamicLayerFrontMode, PythonDispatcher].
I want to perform full-integer quantization, as the hardware I want to deploy my model on only supports integer.
If I add the QuantStub layer I can at least run the model, but in that case the model wil assume that input, weight and bias are fp32. But for my use case I need all of these parameters to be available as int8.