What's the supported datatype for activation in torch.ao.nn.quantized.linear?

supported datatype

What is the supported datatype for weight and activation in torch.ao.nn.quantized.modules.linear.Linear?
I didn’t seen any comments on the datatype. Are uint8 and int8 both supported for weight and activate?

Relationship between layer and interface

What’is the relationship between torch.ao.nn.quantized.modules.linear.Linear and torch.ao.nn.quantized.functional.linear? Does modules.linear.Linear call functional.linear?

Why only uint8 supported for activation in functional interface linear?

In torch.ao.nn.quantized.functional.linear, input is defined as tensor of type torch.quint8. Why cannot it support torch.qint8? Why is only uint8 supported?

I think the permissible quantized datatypes largely depend on backend support. What backend are you using?

I think both torch.ao.nn.quantized.Linear and torch.ao.nn.quantized.functional.linearcalltorch.ops.quantized.linear` under the hood (from using ?? in Jupyter to look at the source).

Best regards

Thomas

supported datatype

well for weight its (qint8 or fp16) https://github.com/pytorch/pytorch/blob/main/torch/ao/nn/quantized/modules/linear.py#L30

for activation it depends on the backend, the default qconfig has an activation observer looking for a quint8 for all the known backends though

Relationship between layer and interface

torch.ao.nn.quantized.modules.linear.Linear and torch.ao.nn.quantized.functional.linear are effectively the same thing, they’re just 2 ways to access the underlying kernel:

fx quantization uses functionals while eager mode uses modules.

Why only uint8 supported for activation in functional interface linear?

its actuall only quint8 support, i.e. a quantization wrapper on a uint8 tensor. But the reason we don’t support qint8 activation is generally simply just based on kernels, without a kernel to do the math, we can’t support it. the ao team doesn’t actually create fbgemm, xnnpack, qnnpack…etc we just create APIs to use those kernel’s.

As for why those team’s dont support qint8, i think its because quint8 tends to be used more for affine quantization and qint8 tends to be used more for symmetric quantization (since its symmetric about 0) and generally activations get affine quantization since they are often not centered around 0 (the output of a relu for example).

1 Like