Supported quantized tensor operations

Is there a list of currently supported operations for quantized tensors?

I run into issues quantizing a network requiring tensor additions:

RuntimeError: Could not run 'aten::add.Tensor' with arguments from the 'QuantizedCPUTensorId' backend. 'aten::add.Tensor' is only available for these backends: [SparseCPUTensorId, CPUTensorId, VariableTensorId, MkldnnCPUTensorId].

Others have reported running into issues with div and cat operations - I presume these are also not supported atm.

In case this is helpful to anyone, there are:


that support add and other operations.


Add is supported, but not as a at::add.Tensor. The reason is that addition (or any arithmetic) requires output scale/zero_point. Also, often times quantized ops need to be stateful. Hence, there are stateful FloatFunctional and QFunctional

1 Like

Hey there :wave:
I don’t see, how I can apply this solution :frowning:
Do I have to replace the my out += residual operation in my model with QFunctional?
Regards LMW

No, you need to follow the static quantization flow, and replace all occurances of the addition with a layer FloatFunctional. For example, if you have a model that looks like that:

class Foo(nn.Module):
  def __init__(self):
  def forward(self, x):
    y = x + x
    z = y + 2 * x
    return x

it is not “quantizable”. The reason is that the additions and multiplication are in the forward, and don’t store the state of the scale and zero_point needed for the quantization.

Your quantizable model will looks something like that:

class Foo(nn.Module):
  def __init__(self):
    self.first_functional = nn.quantized.FloatFunctional()
    self.second_functional = nn.quantized.FloatFunctional()
    self.third_functional = nn.quantized.FloatFunctional()

  def forward(self, x):
    y = self.first_functional.add(x, x)
    x2 = self.second_functional.mul_static(x, 2)
    z = self.third_functional.add(y, x2)
    return z

Notice that we don’t reuse the float functional modules, and use a different one for every arithmetic operator. This is done so that the quantization parameters for each operation would be computed differently.

Once you rewrite your models, you can run the static quantization (prepare, calibrate, convert steps): Quantization — PyTorch 1.7.1 documentation