Static quantization returning torch.float32 parameters


I am currently following the official PyTorch static quantization tutorial for quantizing my model. The model uses a ResNet backbone and implements other things as a Generalized Mean Pooling, the complete code can be accessed here in this gist.

I could follow the tutorial steps and run the torch.quantization.convert(myModel, inplace=True) line to finally quantize my model, but when I check the size of the model, it’s almost the same size as before quantizing (186mb to 174mb).

The main code is this (complementary to the gist):

# My own function to load the checkpoint
ckpt = common.load_checkpoint(settings.CKPT_MODEL, False)
# The definition of QuantizableRMAC is in the gist
# But it is similar to how is done in the tutorial
qnet = QuantizableRMAC(**ckpt["model_options"])
qnet.fuse_model() # Fuse model here!!!!
qnet.qconfig = torch.quantization.default_qconfig
torch.quantization.prepare(qnet, inplace=True)
data_loader = quantizable_dataloader("path/to/images")

def evaluate(model, data_loader):
    cnt = 0
    with torch.no_grad():
        for image, _ in data_loader:
evaluate(qnet, data_loader, 1)
quantized_net = torch.quantization.convert(qnet)

def print_size_of_model(model):, "temp.p")
    print('Size (MB):', os.path.getsize("temp.p")/1e6)


for p in quantized_net.parameters():

The last line prints only torch.float32.

Hi Paulo,
I am not able to access the gist. Can you fix that? The main code looks ok, except that you are missing calling torch.quantization.fuse_model() which is required for quantization to work. Quantization does not support unfused batch norms currently.

Hello @raghuramank100, thanks for your reply!

I think you can now access the gist. In fact, I call the fuse_model() function of the QuantizableRMAC class (in the gist), which in turn calls the ResNet fuse_model() function, in the same way you wrote the original code here. I just missed this part of the code when transitioning to here. I am going to fix this in the first post to match the way it is done.

However, when I inspect the model by printing its modules names after calling the fuse_model() function, I see that I successfully fused all conv, bn and relus, that’s why I am not sure about what is happening.

I made a minimum working example here, I use the code from (I copied the code to a jupyter notebook)


However, it does not work, it raises this error:


And when removing the model.eval() code, it does not raises any error, but does not work as expected:


This is using the same ResNet, QuantizableResNet, and QuantizableBottleneck code used in the tutorials. These pictures were taken with this minimum reproducible code
Thoughts? @raghuramank100 @jerryzh168