I am trying Dynamic Quantization and my model is mostly composed of linear layers. Hence is cant do any other quantization techniques. But with dynamic quantization of qint8 model size was reduced but the accuracy of model too. Is there any ways i can improve my accuracy ???
Static quantization supports Linear modules, if you’d like to try it out. Accuracy will inevitably drop, but it can be controlled by selectively quantizing some layers in your model (as opposed to all)
when i try with static quantization,
backend = "fbgemm"
model_baseline.qconfig = t.quantization.get_default_qconfig(backend)
t.backends.quantized.engine = backend
model_static_quantized = t.quantization.prepare(model_baseline, inplace=False)
model_static_quantized = t.quantization.convert(model_static_quantized, inplace=False)
i get error as
AssertionError: The only supported dtype for nnq.Embedding is torch.quint8
is there any way to set dtype before
here are the supported qconfigs for embedding: pytorch/test_quantize_fx.py at master · pytorch/pytorch · GitHub, you can try setting the qconfig for the embedding ops to these qconfigs I think