Dynamic Quantization accuracy loss

Hi everyone,

I am trying Dynamic Quantization and my model is mostly composed of linear layers. Hence is cant do any other quantization techniques. But with dynamic quantization of qint8 model size was reduced but the accuracy of model too. Is there any ways i can improve my accuracy ???

Static quantization supports Linear modules, if you’d like to try it out. Accuracy will inevitably drop, but it can be controlled by selectively quantizing some layers in your model (as opposed to all)

when i try with static quantization,

backend = "fbgemm"
model_baseline.qconfig = t.quantization.get_default_qconfig(backend)
t.backends.quantized.engine = backend
model_static_quantized = t.quantization.prepare(model_baseline, inplace=False)
model_static_quantized = t.quantization.convert(model_static_quantized, inplace=False)

i get error as

AssertionError: The only supported dtype for nnq.Embedding is torch.quint8

is there any way to set dtype before

Does this post help you? Is it planned to support nn.Embeddings quantization? - #15 by supriyar

here are the supported qconfigs for embedding: pytorch/test_quantize_fx.py at master · pytorch/pytorch · GitHub, you can try setting the qconfig for the embedding ops to these qconfigs I think