I am new to quantization and am following this official tutorial, I’ve noticed that the accuracy after calibration before converting (using the official tutorial code below) is lower than the accuracy of the original float_model . Here is the relevant part of the code:
myModel.qconfig = torch.ao.quantization.default_qconfig
print(myModel.qconfig)
torch.ao.quantization.prepare(myModel, inplace=True)
# Calibrate first
print('Post Training Quantization Prepare: Inserting Observers')
print('\n Inverted Residual Block:After observer insertion \n\n', myModel.features[1].conv)
# Calibrate with the training set
top1, top5 = evaluate(myModel, criterion, data_loader, neval_batches=num_calibration_batches)
print('Post Training Quantization: Calibration done')
The top1 and top5 returned from calibration are worse than original float32 model’s top1 and top5 on the same dataset (the line of code below)
top1, top5 = evaluate(float_model, criterion, data_loader, neval_batches= num_calibration_batches)
From my understanding, calibration should only be gathering statistics like the range of activations and shouldn’t alter the network’s computations at this stage. I thought any computational changes would only occur after calling torch.ao.quantization.convert . Could someone clarify if a decrease in accuracy during calibration is expected?
Thank you for your insights!