Which layers support dynamic quantization?

I’m trying to quantize my model using torch.quantization.quantize_dynamic(model, {nn.ConvTranspose1d}, dtype=torch.qint8), but my model size doesn’t decrease and neither does computation time.
However if I add nn.Linear to the set of layers to quantize, it seems to have an effect. Naturally this means that ConnTranspose1d doesn’t support dynamic quantization.

My question is very simple: Which layers support dynamic quantization at the moment? Is a list present somewhere? I couldn’t find anything in the docs.

I’m running the model on i7 processor (Macbook pro), pytorch version 1.7.0 installed using pip

Hi @ayush-1506, the list of supported dynamic quantization layers is here: https://github.com/pytorch/pytorch/blob/master/torch/quantization/quantization_mappings.py#L76

Thanks for sharing this, @Vasiliy_Kuznetsov ! Are there plans for supporting other layers ?

I’m not aware of plans to add more layers to dynamic quantization specifically. What would be the use case?

The currently supported layers looks like a very small subset of all possible out-of-the-box layers. Suppose I’m interested in quantizing a Convnet (Conv2d, Conv1d) layers. This won’t work at the moment, right?

1 Like

@ayush-1506, you can check out static quantization or QAT which support convolutions.

Thanks, will check it out. I’ve statically-quantized a model and stored the state dict locally, but loading it back gives me this error :
RuntimeError: Could not run 'quantized::conv_transpose1d_prepack' with arguments from the 'CPU' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'quantized::conv_transpose1d_prepack' is only available for these backends: [QuantizedCPU, BackendSelect, Named, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, Tracer, Autocast, Batched, VmapMode].

Any idea what this is about ? Thanks again

this error message means that you are trying to give a fp32 tensor to a quantized layer. The way to fix it is to use QuantStub and DeQuantStub to control the conversions, feel free to check out the examples on https://pytorch.org/docs/stable/quantization.html .