Hello,
I want to quantize a model so that I can pass int8 values directly into the model post quantization. However, the tutorials all seem to assume that I still pass fp32 which is then converted using QuantStub, so I am not really sure where to look for a better implementation
My code looks like this:
class ConvModel(nn.Module):
def __init__(self, in_channels, out_channels, kernel_size, num_classes=0):
super().__init__()
self.c1 = nn.Conv2d(in_channels, out_channels, kernel_size)
def forward(self, x):
x = self.c1(x)
return x
# Quantize the model
input_fp = torch.rand(1, input_height, input_width, input_channels)
model = ConvModel(input_channels, output_channels, kernel_size)
model.eval()
# Specify quantization configuration
# Start with simple min/max range estimation and per-tensor quantization of weights
model.qconfig = torch.ao.quantization.default_qconfig
torch.ao.quantization.prepare(model, inplace=True)
# Pseudo-Calibration
model(input_fp)
#Convert to quantized model
torch.ao.quantization.convert(model, inplace=True)
# Save the model.
torch.jit.save(torch.jit.script(model), "conv2d_model_scripted_quantized.pth")
#Generate expected output data
input_matrix = torch.randint(0, 128, (1, input_channels, input_height, input_width), dtype=torch.int8)
expected_output = model(input_matrix)
With this, I get the following error message:
NotImplementedError: Could not run 'quantized::conv2d.new' with arguments from the 'CPU' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions.
'quantized::conv2d.new' is only available for these backends: [QuantizedCPU, QuantizedCUDA, BackendSelect, Python, FuncTorchDynamicLayerBackMode, Functionalize, Named, Conjugate, Negative, ZeroTensor, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradMPS, AutogradXPU, AutogradHPU, AutogradLazy, AutogradMeta, Tracer, AutocastCPU, AutocastCUDA, FuncTorchBatched, FuncTorchVmapMode, Batched, VmapMode, FuncTorchGradWrapper, PythonTLSSnapshot, FuncTorchDynamicLayerFrontMode, PythonDispatcher].
I want to perform full-integer quantization, as the hardware I want to deploy my model on only supports integer.
If I add the QuantStub layer I can at least run the model, but in that case the model wil assume that input, weight and bias are fp32. But for my use case I need all of these parameters to be available as int8.