After calling torch quantization convert doing Quantize Aware Training. I loop through each parameter of the model. Each parameter in the model datatype is float32, not int8.
This show a quick and small example of Quantize Aware Training I did to understand how it work Google Colab
this is expected, quantization aware training only simulates the numerics of quantization with fake quantization modules, it’s not really quantizing the model, to get a real quantized model you’ll need to call convert on the model
In my understanding, I called torch.ao.quantization ( model ) to transform my model int8. Second, how do I ensure my model is quantized, aware, and trained for int8?
once you use the convert call a quick way to make sure your model is quantized is to check its size using:
param_size = 0
for param in converted_model.parameters():
param_size += param.nelement() * param.element_size()
buffer_size = 0
for buffer in model.buffers():
buffer_size += buffer.nelement() * buffer.element_size()
size_all_mb = (param_size + buffer_size) / 1024**2
so using this converted model you can do your inference and currently in pytorch it only support the 8bits quantization for the method convert.
I used your approach. However, when I called the parameters in the converted model, is empty. While the buffer in the converted model return two elements where the model is a Resnet.
I saw your code in colab this should fix your problem:
model = train_model(model=model, train_loader=train_loader, test_loader=test_loader, device=cuda_device, learning_rate=1e-3, num_epochs=5)
# then do this
model = torch.ao.quantization.convert(model , inplace = True)
try this and let me know if it works
I made the modification where model is now the new train model. The same problem still exist. Getting the parameter of the converted model is None.
I am not sure why this is happening I tried to debug your code a little bit but its hard to debug in colab environment especially while working on quantization. Anw I suggests that you try a very simple example at first just the one in the documentation and with this example you can understand more your questions
I selected a single example from PyTorch Quantization Documentation. This example is the demonstration of PyTorch Quantize Aware Training. The converted model does have parameters.
Yes that’s normal, and this why I want you to print your new model size this can confirm that your model is compressed. Let me explain If you print a quantized tensor, you will see the floating point values, a scale and a zero point. The built-in representation is stored in integers, and you can see this with int_rep().
check mapping formula to understand how to represent your quantized weights either in floating or int so you can verify by yourself that the weights you are seeing are just displayed in different format. I hope I was clear if you still have questions let me know, I recommend that you read more about quantization in theory and on the process how it works in pytorch using the documentation before digging in the code so in this way you can understand more whats happening.