Accuracy Drop During Calibration (before Conversion) in Post Training Quantization

I am new to quantization and am following this official tutorial, I’ve noticed that the accuracy after calibration before converting (using the official tutorial code below) is lower than the accuracy of the original float_model . Here is the relevant part of the code:

myModel.qconfig = torch.ao.quantization.default_qconfig
print(myModel.qconfig)
torch.ao.quantization.prepare(myModel, inplace=True)

# Calibrate first
print('Post Training Quantization Prepare: Inserting Observers')
print('\n Inverted Residual Block:After observer insertion \n\n', myModel.features[1].conv)

# Calibrate with the training set
top1, top5 = evaluate(myModel, criterion, data_loader, neval_batches=num_calibration_batches)
print('Post Training Quantization: Calibration done')

The top1 and top5 returned from calibration are worse than original float32 model’s top1 and top5 on the same dataset (the line of code below)

top1, top5 = evaluate(float_model, criterion, data_loader, neval_batches= num_calibration_batches)

From my understanding, calibration should only be gathering statistics like the range of activations and shouldn’t alter the network’s computations at this stage. I thought any computational changes would only occur after calling torch.ao.quantization.convert . Could someone clarify if a decrease in accuracy during calibration is expected?

Thank you for your insights!

Hi @bcxiao,

Are you comparing the top1 and top5 accuracy to that produced by this line in the tutorial?

top1, top5 = evaluate(float_model, criterion, data_loader_test, neval_batches=num_eval_batches)

It is expected for the result you gave to be different, not because of differences to the model, but because num_eval_batches != num_calibration_batches, since we don’t need to calibrate on all of the evaluation data. This is because you are now only doing evaluation on a subset of the data, so it is not a fair comparison of these two metrics.

I would expect them to be the same, or at least very similar, if you changed num_calibration_batches to be num_eval_batches though. Can you try making that change and seeing what you get?