Hello,
in Tensorflow I can specify my desired input/output types when using the coverter for quantization like this:
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.uint8
converter.inference_output_type = tf.int8
Is something like this possible in Torch? Specifically the last two lines are of interest to me. Thanks!
yes, it is supported, for eager mode quantization, you’d need to manually make this happen, e.g. don’t insert QuantStub in the beginning of the model and don’t insert DeQuantStub in the end of the model.
For fx graph mode quantization, here is the API you will use: PrepareCustomConfig — PyTorch 2.0 documentation and here is a test: pytorch/test_quantize_fx.py at main · pytorch/pytorch · GitHub
Thanks! Is there an example on how to do this in eager mode? I can’t find anything specific on it.
we don’t have test examples right now I think, it will be something like the following:
class M(torch.nn.Module):
def __init__(self):
super().__init__()
self.conv1 = torch.nn.Conv2d(3, 3, 3)
self.conv2 = torch.nn.Conv2d(3, 3, 3)
def forward(self, x):
# after quantization conv1 will become quantized conv1, so accepting quantized inputs
x = self.conv1(x)
x = self.conv2(x)
# after quantization conv2 will become quantized conv2, so outputs a quantized inputs
return x
m = M().eval()
m.conv1.qconfig = default_qconfig
m.conv2.qconfig = default_qconfig
m = torch.ao.quantization.prepare(m)
# calibration
m = torch.ao.quantization.convert(m)
quantized_tensor = torch.quantize_per_tensor(float_tensor, scale, zero_point, dtype)
# inference
output = m(quantized_tensor)
# output should be quantized as well
assert output.is_quantized