Question about quantization tutorial and fusing model

Hi, I am new to Deep Learning and Pytorch. I am interested in quantization and have gone through the Transfer learning and Post Training Static Quantization tutorial. However, there are a few questions that i hope to get some idea from the community.

For Transfer learning:

  • I noticed that the quantized model implements a custom head, for fine tuning purpose. However, since the model is restructured due to the

from torch import nn

def create_combined_model(model_fe):

Step 1. Isolate the feature extractor.

model_fe_features = nn.Sequential(
model_fe.quant, # Quantize the input
model_fe.dequant, # Dequantize the output

Step 2. Create a new “head”

new_head = nn.Sequential(
nn.Linear(num_ftrs, 2),

Step 3. Combine, and don’t forget the quant stubs.

new_model = nn.Sequential(
return new_model

  • Why is there no new forward() defined for it? Can the forward() recognize the new network layout for the model automatically?

For Post Training Static Quantization:

  • I noticed that the tutorial transforms the pretrained model to quantized model by merging the intermediate operations such as nn.Conv2d(), nn.ReLU() and nn.BatchNorm2d(), into ConvBnReLU2d() . I know that the guideline in Quantization, suggests to perform operation fusing whenever quantizing a model. But is it possible to implement each quantized modules independently without fusing all of them as one? As i believe i have seen the quantization implementation of nn.quantized.Conv2d(), nn.quantized.ReLU(). (Although there is no nn.quantized.BatchNorm2d() yet).
  • The reason i am asking is because i am interested in extracting the intermediate outputs. I would like to inspect the intermediate outputs such as output from nn.quantized.Conv2d() and nn.quantized.ReLU() independently. I believe if i fuse the module using ConvBnReLU2d, it would only yield me the final output that has gone through BatchNorm2d() and ReLU() instead of the intermediate outputs for each intermediate operations, right?

I am new to this community and this is my first post. If this post does not follow the community guideline, please let me know. Thank you.

For Post Training Static Quantization
I think you can leave Conv2d and ReLU separated, but it will impact the performance. It could work for debugging purpose. For batchnorm you have to fuse it with Conv since there’s no quantized batchnorm.

cc @Zafar for question on transfer learning.


@JC_DL we do support quantized conv, quantized relu and quantized batchnorm operators. So it should be possible to execute these operators standalone as well.

1 Like

Hi, may i know where i can find out more about quantized batchnorm? I did not see BatchNorm2d() listed under torch.nn.quantized in the Quantization page. Thanks.

I believe the docs haven’t been updated. Will do so.
Here is the code for quantized batchorm -

1 Like


  1. Can you please explain why there is a performance impact if we don’t fuse the layers?
  2. what exactly the meaning of fusing layers?
  3. what does ConvRelu2d means? What has happened to the batch normalization layer?

Please help me to understand!

1 Like