How to specify input and output types

Hello,

in Tensorflow I can specify my desired input/output types when using the coverter for quantization like this:

converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.uint8
converter.inference_output_type = tf.int8

Is something like this possible in Torch? Specifically the last two lines are of interest to me. Thanks!

yes, it is supported, for eager mode quantization, you’d need to manually make this happen, e.g. don’t insert QuantStub in the beginning of the model and don’t insert DeQuantStub in the end of the model.

For fx graph mode quantization, here is the API you will use: PrepareCustomConfig — PyTorch 2.0 documentation and here is a test: pytorch/test_quantize_fx.py at main · pytorch/pytorch · GitHub

Thanks! Is there an example on how to do this in eager mode? I can’t find anything specific on it.

we don’t have test examples right now I think, it will be something like the following:

class M(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = torch.nn.Conv2d(3, 3, 3)
        self.conv2 = torch.nn.Conv2d(3, 3, 3)

    def forward(self, x):
         # after quantization conv1 will become quantized conv1, so accepting quantized inputs
         x = self.conv1(x)
         x = self.conv2(x)
         # after quantization conv2 will become quantized conv2, so outputs a quantized inputs
         return x


m = M().eval()
m.conv1.qconfig = default_qconfig
m.conv2.qconfig = default_qconfig

m = torch.ao.quantization.prepare(m)
# calibration
m = torch.ao.quantization.convert(m)

quantized_tensor = torch.quantize_per_tensor(float_tensor, scale, zero_point, dtype)
# inference
output = m(quantized_tensor)

# output should be quantized as well
assert output.is_quantized

1 Like

Thanks! Two small questions about this:

  1. How can I get scale and zero_point to correctly use quantize_per_tensor?
  2. The quantized model now seem to give quint8 or qint8 data type. Is there a way to change this to regular int8 or uint8 types?

1, you can use quantize_per_tensor_dynamic that will calculate quantization parameters based on the single input and quantize the input, or use an observer(pytorch/observer.py at main · pytorch/pytorch · GitHub) to get the quantization parameters
2. you can get regular Tensors by calling quantized_tensor.int_repr()