Static/Dynamic Quantization

I tried quantizing a model using both static and dynamic quantization. Both schemes quantized the weights of the layers but did not quantize the biases. Is there a reason why? and how can I quantize the biases?

Implementation is similar to this

biases are not quantized and kept in fp32. For convs and Linears, bias are dynamically quantized before addition while doing convs/Linears.

Hi, Thanks for your reply.
If you’re saying that biases are not quantized, what do you then mean by biases are dynamically quantized for linears?

When the linear is run, it converts biases to int32 before adding to matmul result.

I see. Do you have any idea why PyTorch does it like that?

if quantized, biases are usually quantized with a scale = activation_scale * weight_scale so that quantized bias can directly be added to matmul output in quantized domain. In pytorch eager mode (due to dynamic nature of pytorch graph), knowing activation scale statically is impossible.