What's the supported datatype for activation in torch.ao.nn.quantized.linear?

oleotiger · November 2, 2023, 2:02pm

supported datatype

What is the supported datatype for weight and activation in torch.ao.nn.quantized.modules.linear.Linear?
I didn’t seen any comments on the datatype. Are uint8 and int8 both supported for weight and activate?

Relationship between layer and interface

What’is the relationship between torch.ao.nn.quantized.modules.linear.Linear and torch.ao.nn.quantized.functional.linear? Does modules.linear.Linear call functional.linear?

Why only uint8 supported for activation in functional interface linear?

In torch.ao.nn.quantized.functional.linear, input is defined as tensor of type torch.quint8. Why cannot it support torch.qint8? Why is only uint8 supported?

tom · November 2, 2023, 3:30pm

I think the permissible quantized datatypes largely depend on backend support. What backend are you using?

I think both torch.ao.nn.quantized.Linear and torch.ao.nn.quantized.functional.linearcalltorch.ops.quantized.linear` under the hood (from using ?? in Jupyter to look at the source).

Best regards

Thomas

HDCharles · November 4, 2023, 1:00am

supported datatype

well for weight its (qint8 or fp16) https://github.com/pytorch/pytorch/blob/main/torch/ao/nn/quantized/modules/linear.py#L30

for activation it depends on the backend, the default qconfig has an activation observer looking for a quint8 for all the known backends though

github.com

pytorch/pytorch/blob/075cb6bab6ba72f4f8ceba09d627b1b70fbbede6/torch/ao/quantization/qconfig.py#L224


      
          """
          Fused version of `default_qat_config`, has performance benefits.
          """
          
          default_reuse_input_qconfig = QConfig(activation=default_reuse_input_observer,
                                                weight=NoopObserver)
          """
          Default qconfig for operators that reuse the observers from input Tensor, e.g. reshape
          """
          
          def get_default_qconfig(backend='x86', version=0):
              """
              Returns the default PTQ qconfig for the specified backend.
          
              Args:
                * `backend` (str): a string representing the target backend. Currently supports
                  `x86` (default), `fbgemm`, `qnnpack` and `onednn`.
          
              Return:
                  qconfig
              """

Relationship between layer and interface

torch.ao.nn.quantized.modules.linear.Linear and torch.ao.nn.quantized.functional.linear are effectively the same thing, they’re just 2 ways to access the underlying kernel:

github.com

pytorch/pytorch/blob/main/torch/ao/nn/quantized/functional.py#L392C21-L392C21


      
          return torch.ops.quantized.linear(input, _packed_params, scale, zero_point)

github.com

pytorch/pytorch/blob/main/torch/ao/nn/quantized/modules/linear.py#L168


      
          
          def extra_repr(self):
              return 'in_features={}, out_features={}, scale={}, zero_point={}, qscheme={}'.format(
                  self.in_features, self.out_features, self.scale, self.zero_point, self.weight().qscheme()
              )
          
          def __repr__(self):
              return _hide_packed_params_repr(self, LinearPackedParams)
          
          def forward(self, x: torch.Tensor) -> torch.Tensor:
              return torch.ops.quantized.linear(
                  x, self._packed_params._packed_params, self.scale, self.zero_point)
          
          # ===== Serialization methods =====
          # The special consideration here is that we have to unpack the weights into their
          # regular QTensor form for serialization. Packed weights should not live
          # outside the process in which they were created, rather they should be derived
          # from the QTensor weight.
          #
          # Version 1
          #   self

fx quantization uses functionals while eager mode uses modules.

Why only uint8 supported for activation in functional interface linear?

its actuall only quint8 support, i.e. a quantization wrapper on a uint8 tensor. But the reason we don’t support qint8 activation is generally simply just based on kernels, without a kernel to do the math, we can’t support it. the ao team doesn’t actually create fbgemm, xnnpack, qnnpack…etc we just create APIs to use those kernel’s.

As for why those team’s dont support qint8, i think its because quint8 tends to be used more for affine quantization and qint8 tends to be used more for symmetric quantization (since its symmetric about 0) and generally activations get affine quantization since they are often not centered around 0 (the output of a relu for example).