Is there a list of currently supported operations for quantized tensors?
I run into issues quantizing a network requiring tensor additions:
RuntimeError: Could not run 'aten::add.Tensor' with arguments from the 'QuantizedCPUTensorId' backend. 'aten::add.Tensor' is only available for these backends: [SparseCPUTensorId, CPUTensorId, VariableTensorId, MkldnnCPUTensorId].
Others have reported running into issues with div and cat operations - I presume these are also not supported atm.
Add is supported, but not as a at::add.Tensor. The reason is that addition (or any arithmetic) requires output scale/zero_point. Also, often times quantized ops need to be stateful. Hence, there are stateful FloatFunctional and QFunctional
No, you need to follow the static quantization flow, and replace all occurances of the addition with a layer FloatFunctional. For example, if you have a model that looks like that:
class Foo(nn.Module):
def __init__(self):
super().__init__()
def forward(self, x):
y = x + x
z = y + 2 * x
return x
it is not “quantizable”. The reason is that the additions and multiplication are in the forward, and don’t store the state of the scale and zero_point needed for the quantization.
Your quantizable model will looks something like that:
class Foo(nn.Module):
def __init__(self):
super().__init__()
self.first_functional = nn.quantized.FloatFunctional()
self.second_functional = nn.quantized.FloatFunctional()
self.third_functional = nn.quantized.FloatFunctional()
def forward(self, x):
y = self.first_functional.add(x, x)
x2 = self.second_functional.mul_static(x, 2)
z = self.third_functional.add(y, x2)
return z
Notice that we don’t reuse the float functional modules, and use a different one for every arithmetic operator. This is done so that the quantization parameters for each operation would be computed differently.