Duing Quantize Aware Training the model paramter datatype is float32

Rami_Ismael · September 12, 2022, 9:47pm

After calling torch quantization convert doing Quantize Aware Training. I loop through each parameter of the model. Each parameter in the model datatype is float32, not int8.

This show a quick and small example of Quantize Aware Training I did to understand how it work Google Colab

jerryzh168 · September 12, 2022, 11:56pm

this is expected, quantization aware training only simulates the numerics of quantization with fake quantization modules, it’s not really quantizing the model, to get a real quantized model you’ll need to call convert on the model

Rami_Ismael · September 13, 2022, 12:17am

In my understanding, I called torch.ao.quantization ( model ) to transform my model int8. Second, how do I ensure my model is quantized, aware, and trained for int8?

jerryzh168 · September 13, 2022, 1:05am

can you take a look at Quantization — PyTorch master documentation

christophezei · September 14, 2022, 3:55pm

once you use the convert call a quick way to make sure your model is quantized is to check its size using:

param_size = 0
for param in converted_model.parameters():
    param_size += param.nelement() * param.element_size()
buffer_size = 0
for buffer in model.buffers():
    buffer_size += buffer.nelement() * buffer.element_size()

size_all_mb = (param_size + buffer_size) / 1024**2

so using this converted model you can do your inference and currently in pytorch it only support the 8bits quantization for the method convert.

Rami_Ismael · September 14, 2022, 5:34pm

I used your approach. However, when I called the parameters in the converted model, is empty. While the buffer in the converted model return two elements where the model is a Resnet.

christophezei · September 14, 2022, 6:03pm

I saw your code in colab this should fix your problem:

model = train_model(model=model, train_loader=train_loader, test_loader=test_loader, device=cuda_device, learning_rate=1e-3, num_epochs=5)

# then do this 
"""Convert"""
model.to("cpu")
model.eval()
model = torch.ao.quantization.convert(model , inplace = True)

try this and let me know if it works

Rami_Ismael · September 14, 2022, 7:38pm

I made the modification where model is now the new train model. The same problem still exist. Getting the parameter of the converted model is None.

christophezei · September 14, 2022, 10:10pm

I am not sure why this is happening I tried to debug your code a little bit but its hard to debug in colab environment especially while working on quantization. Anw I suggests that you try a very simple example at first just the one in the documentation and with this example you can understand more your questions

Rami_Ismael · September 14, 2022, 11:32pm

I selected a single example from PyTorch Quantization Documentation. This example is the demonstration of PyTorch Quantize Aware Training. The converted model does have parameters.

christophezei · September 15, 2022, 8:02am

Yes that’s normal, and this why I want you to print your new model size this can confirm that your model is compressed. Let me explain If you print a quantized tensor, you will see the floating point values, a scale and a zero point. The built-in representation is stored in integers, and you can see this with int_rep().

model.name_of_the_quantized_layer.weight().int_repr()

check mapping formula to understand how to represent your quantized weights either in floating or int so you can verify by yourself that the weights you are seeing are just displayed in different format. I hope I was clear if you still have questions let me know, I recommend that you read more about quantization in theory and on the process how it works in pytorch using the documentation before digging in the code so in this way you can understand more whats happening.