Lowering exir to quantized operations (no delegate)

Hi,

i’m working with executorch and torch.ao to enable quantization aware training in our workflow and possibly lowering a quantized implementation using executorch but no delegate. I wrote a Quantizer that can be configured to match multiple qscheme and i get a properly annotated graph than can be used in QAT. After pt2e conversion i get a model with inserted quantizer/dequantizer that uses the ATen operators (floating points) that i can convert to edge. PyTorch has a set of quantized ATen operators (pytorch/aten/src/ATen/native/quantized at main · pytorch/pytorch · GitHub) and i’d like to be able to use them to run the model inference. Looking through executorch code, there is a mention of replace_quantized_partition_with_op in exir/backend/utils but i don’t see any use of it in the repository. Is there any way to replace dq/op/q partitions in the exir graph to use exir/ATen quantized operators without using a delegate ? In executorch/kernels/aten/functions.yaml there is no ATen quantized operators registered and the quantized folder only registers a very limited set of operations (mainly quantize/dequantizers), is there any plan to have quantized operators supported in the runtime with no delegate ?

I believe ATen mode is not supported in OSS (and it’s not supported on mobile/embedded devices anyway AFAIK), so I don’t think this would help you even if it existed.

no delegate

Can you say more about why XNNPACK is not suitable for your use case?

is there any plan to have quantized operators supported in the runtime with no delegate ?

@mergennachin / @manuelcandales do either of you know if this is in the plan?