I have a model which is trained in Kaldi and I’m able to load the model parameters in PyTorch as tensors.

I am trying to perform post-quantization of the weight matrices and I’ve tried to use the quantize_per_tensor function.

For. ex:

```
a = torch.rand(10)
b = torch.rand(10)
scale_a = (max_a - min_a) / (qmax - qmin)
zpt_a = qmin - min_a / scale_a
scale_b = (max_b - min_b) / (qmax - qmin)
zpt_b = qmin - min_b / scale_b
a_quant = torch.quantize_per_tensor(a, scale_a, -127, torch.qint8)
b_quant = torch.quantize_per_tensor(b, scale_b, -127, torch.qint8)
a_quant + b_quant
```

When I add the 2 quantized tensors, I get the below error

```
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
RuntimeError: Could not run 'aten::add.Tensor' with arguments from the 'QuantizedCPU' backend. 'aten::add.Tensor' is only available for these backends: [CPU, CUDA, MkldnnCPU, SparseCPU, SparseCUDA, Meta, Named, Autograd, Profiler, Tracer].
```

It seems that I can convert fp32 to int8 but not perform any integer arithmetic .

Any help as to how to use this will be appreciated.

Thanks!