Activation quantization issues

Hi,

I have been trying to use static quantization in order to compress a MobileNetV3 model.
I have posted a similar question a few days ago but did not provide enough details (will be deleted).
I have been able to quantize the weights of the model, but I cannot manage to quantize the outputs.
I have used the SQNR metric in order to quantify the quality of the quantization.
I have tried the same quantization pipeline with different easier models, and again, the weights were well quantized, but not the activations…

My intuition is that there is something wrong in my quantization pipeline, especially with the calibration part. The dataset we use for calibration is the validation dataset
Here are the results we obtain for the weights:

Weight SQNR for backbone.0.0.weight: 44.185325622558594
Weight SQNR for backbone.1.block.0.0.weight: 46.40886688232422
Weight SQNR for backbone.1.block.1.fc1.weight: 45.98581314086914
Weight SQNR for backbone.1.block.1.fc2.weight: 48.130802154541016
Weight SQNR for backbone.1.block.2.0.weight: 42.821144104003906
Weight SQNR for backbone.2.block.0.0.weight: 43.544769287109375
Weight SQNR for backbone.2.block.1.0.weight: 45.990516662597656
Weight SQNR for backbone.2.block.2.0.weight: 40.75975036621094
Weight SQNR for backbone.3.block.0.0.weight: 41.81905746459961
Weight SQNR for backbone.3.block.1.0.weight: 43.50240707397461
Weight SQNR for backbone.3.block.2.0.weight: 42.69906997680664
Weight SQNR for backbone.4.block.0.0.weight: 42.20953369140625
Weight SQNR for backbone.4.block.1.0.weight: 43.526702880859375
Weight SQNR for backbone.4.block.2.fc1.weight: 37.113765716552734
Weight SQNR for backbone.4.block.2.fc2.weight: 48.13081359863281
Weight SQNR for backbone.4.block.3.0.weight: 42.22209930419922
Weight SQNR for backbone.5.block.0.0.weight: 42.994667053222656
Weight SQNR for backbone.5.block.1.0.weight: 39.69481658935547
Weight SQNR for backbone.5.block.2.fc1.weight: 36.91025161743164
Weight SQNR for backbone.5.block.2.fc2.weight: 40.59980392456055
Weight SQNR for backbone.5.block.3.0.weight: 40.33270263671875
Weight SQNR for backbone.6.block.0.0.weight: 43.93757629394531
Weight SQNR for backbone.6.block.1.0.weight: 40.58481216430664
Weight SQNR for backbone.6.block.2.fc1.weight: 37.284217834472656
Weight SQNR for backbone.6.block.2.fc2.weight: 40.9296875
Weight SQNR for backbone.6.block.3.0.weight: 41.66569519042969
Weight SQNR for backbone.7.block.0.0.weight: 42.66659164428711
Weight SQNR for backbone.7.block.1.0.weight: 39.36699676513672
Weight SQNR for backbone.7.block.2.fc1.weight: 36.698448181152344
Weight SQNR for backbone.7.block.2.fc2.weight: 45.01042175292969
Weight SQNR for backbone.7.block.3.0.weight: 41.98356246948242
Weight SQNR for backbone.8.block.0.0.weight: 43.977821350097656
Weight SQNR for backbone.8.block.1.0.weight: 40.311065673828125
Weight SQNR for backbone.8.block.2.fc1.weight: 39.59221267700195
Weight SQNR for backbone.8.block.2.fc2.weight: 44.09014892578125
Weight SQNR for backbone.8.block.3.0.weight: 42.539039611816406
Weight SQNR for backbone.9.block.0.0.weight: 43.509822845458984
Weight SQNR for backbone.9.block.1.0.weight: 45.22159957885742
Weight SQNR for backbone.9.block.2.fc1.weight: 38.97401428222656
Weight SQNR for backbone.9.block.2.fc2.weight: 42.31846237182617
Weight SQNR for backbone.9.block.3.0.weight: 42.1461181640625
Weight SQNR for backbone.10.block.0.0.weight: 43.392696380615234
Weight SQNR for backbone.10.block.1.0.weight: 40.46477508544922
Weight SQNR for backbone.10.block.2.fc1.weight: 37.95626449584961
Weight SQNR for backbone.10.block.2.fc2.weight: 40.91071319580078
Weight SQNR for backbone.10.block.3.0.weight: 41.7304801940918
Weight SQNR for backbone.11.block.0.0.weight: 43.457489013671875
Weight SQNR for backbone.11.block.1.0.weight: 40.413429260253906
Weight SQNR for backbone.11.block.2.fc1.weight: 39.38266372680664
Weight SQNR for backbone.11.block.2.fc2.weight: 41.352264404296875
Weight SQNR for backbone.11.block.3.0.weight: 41.471954345703125
Weight SQNR for backbone.12.0.weight: 42.99789810180664
Weight SQNR for upsampler.blocks.0.1.depthwise.weight: 48.0961799621582
Weight SQNR for upsampler.blocks.0.1.pointwise.weight: 48.11189270019531
Weight SQNR for upsampler.blocks.1.1.depthwise.weight: 48.03831481933594
Weight SQNR for upsampler.blocks.1.1.pointwise.weight: 48.17645263671875
Weight SQNR for upsampler.blocks.2.1.depthwise.weight: 48.20476531982422
Weight SQNR for upsampler.blocks.2.1.pointwise.weight: 48.25660705566406
Weight SQNR for heads.heads.0.0.depthwise.weight: 48.215003967285156
Weight SQNR for heads.heads.0.0.pointwise.weight: 48.11094665527344
Weight SQNR for heads.heads.0.2.weight: 48.269100189208984
Weight SQNR for heads.heads.1.0.depthwise.weight: 48.15333557128906
Weight SQNR for heads.heads.1.0.pointwise.weight: 48.16814422607422
Weight SQNR for heads.heads.1.2.weight: 47.81619644165039
Weight SQNR for heads.heads.2.0.depthwise.weight: 47.815834045410156
Weight SQNR for heads.heads.2.0.pointwise.weight: 48.29985046386719
Weight SQNR for heads.heads.2.2.weight: 47.726993560791016

for the activation :

Activations SQNR for backbone.0.0.stats: 20.457651138305664
Activations SQNR for backbone.0.2.stats: 19.50702476501465
Activations SQNR for backbone.1.block.0.0.stats: 11.450357437133789
Activations SQNR for backbone.1.block.1.fc1.stats: 0.0
Activations SQNR for backbone.1.block.1.fc2.stats: 42.31361389160156
Activations SQNR for backbone.1.block.2.0.stats: 2.9698023796081543
Activations SQNR for backbone.2.block.0.0.stats: 6.8071393966674805
Activations SQNR for backbone.2.block.1.0.stats: 6.263160228729248
Activations SQNR for backbone.2.block.2.0.stats: 1.28080415725708
Activations SQNR for backbone.3.block.0.0.stats: 6.484529972076416
Activations SQNR for backbone.3.block.1.0.stats: 8.040334701538086
Activations SQNR for backbone.3.block.2.0.stats: -0.46143054962158203
Activations SQNR for backbone.4.block.0.0.stats: 2.356440782546997
Activations SQNR for backbone.4.block.0.2.stats: 2.2479443550109863
Activations SQNR for backbone.4.block.1.0.stats: 4.715736389160156
Activations SQNR for backbone.4.block.1.2.stats: 5.560753345489502
Activations SQNR for backbone.4.block.2.fc1.stats: 0.0
Activations SQNR for backbone.4.block.2.fc2.stats: 22.666215896606445
Activations SQNR for backbone.4.block.3.0.stats: -2.1067328453063965
Activations SQNR for backbone.5.block.0.0.stats: 2.1233863830566406
Activations SQNR for backbone.5.block.0.2.stats: 0.9844297766685486
Activations SQNR for backbone.5.block.1.0.stats: 1.6204828023910522
Activations SQNR for backbone.5.block.1.2.stats: -0.4914323687553406
Activations SQNR for backbone.5.block.2.fc1.stats: 1.4354966878890991
Activations SQNR for backbone.5.block.2.fc2.stats: 8.31513500213623
Activations SQNR for backbone.5.block.3.0.stats: -0.6018408536911011
Activations SQNR for backbone.6.block.0.0.stats: 0.685958743095398
Activations SQNR for backbone.6.block.0.2.stats: 0.26973581314086914
Activations SQNR for backbone.6.block.1.0.stats: 0.42121437191963196
Activations SQNR for backbone.6.block.1.2.stats: -0.6432545781135559
Activations SQNR for backbone.6.block.2.fc1.stats: 0.46626076102256775
Activations SQNR for backbone.6.block.2.fc2.stats: 11.062928199768066
Activations SQNR for backbone.6.block.3.0.stats: -1.0595506429672241
Activations SQNR for backbone.7.block.0.0.stats: 0.37939104437828064
Activations SQNR for backbone.7.block.0.2.stats: 1.408747673034668
Activations SQNR for backbone.7.block.1.0.stats: 1.8498587608337402
Activations SQNR for backbone.7.block.1.2.stats: 2.0974295139312744
Activations SQNR for backbone.7.block.2.fc1.stats: 1.4677486419677734
Activations SQNR for backbone.7.block.2.fc2.stats: 4.2567901611328125
Activations SQNR for backbone.7.block.3.0.stats: -0.2707398533821106
Activations SQNR for backbone.8.block.0.0.stats: 1.8695651292800903
Activations SQNR for backbone.8.block.0.2.stats: 0.7241547107696533
Activations SQNR for backbone.8.block.1.0.stats: 1.1945180892944336
Activations SQNR for backbone.8.block.1.2.stats: -0.1473236083984375
Activations SQNR for backbone.8.block.2.fc1.stats: 2.4075896739959717
Activations SQNR for backbone.8.block.2.fc2.stats: 10.103607177734375
Activations SQNR for backbone.8.block.3.0.stats: -0.31602638959884644
Activations SQNR for backbone.9.block.0.0.stats: 3.1465210914611816
Activations SQNR for backbone.9.block.0.2.stats: 0.7296209931373596
Activations SQNR for backbone.9.block.1.0.stats: 0.7618334889411926
Activations SQNR for backbone.9.block.1.2.stats: 0.41660159826278687
Activations SQNR for backbone.9.block.2.fc1.stats: 0.13484197854995728
Activations SQNR for backbone.9.block.2.fc2.stats: 7.269928932189941
Activations SQNR for backbone.9.block.3.0.stats: -0.7541333436965942
Activations SQNR for backbone.10.block.0.0.stats: 1.6517661809921265
Activations SQNR for backbone.10.block.0.2.stats: 0.45318660140037537
Activations SQNR for backbone.10.block.1.0.stats: 0.56326824426651
Activations SQNR for backbone.10.block.1.2.stats: -0.20691633224487305
Activations SQNR for backbone.10.block.2.fc1.stats: -0.17871148884296417
Activations SQNR for backbone.10.block.2.fc2.stats: 8.563508033752441
Activations SQNR for backbone.10.block.3.0.stats: -0.730739176273346
Activations SQNR for backbone.11.block.0.0.stats: 2.208824396133423
Activations SQNR for backbone.11.block.0.2.stats: 0.4115597903728485
Activations SQNR for backbone.11.block.1.0.stats: -0.23680844902992249
Activations SQNR for backbone.11.block.1.2.stats: -0.38858556747436523
Activations SQNR for backbone.11.block.2.fc1.stats: -0.7758683562278748
Activations SQNR for backbone.11.block.2.fc2.stats: 8.365681648254395
Activations SQNR for backbone.11.block.3.0.stats: -0.35106655955314636
Activations SQNR for backbone.12.0.stats: 0.8061447143554688
Activations SQNR for backbone.12.2.stats: -0.1996762901544571
Activations SQNR for upsampler.blocks.0.1.depthwise.stats: -0.16188614070415497
Activations SQNR for upsampler.blocks.0.1.pointwise.stats: 0.4140855073928833
Activations SQNR for upsampler.blocks.1.1.depthwise.stats: 0.37494543194770813
Activations SQNR for upsampler.blocks.1.1.pointwise.stats: 0.6182276010513306
Activations SQNR for upsampler.blocks.2.1.depthwise.stats: 0.74042147397995
Activations SQNR for upsampler.blocks.2.1.pointwise.stats: 0.5121651887893677
Activations SQNR for heads.heads.0.0.depthwise.stats: 0.41690653562545776
Activations SQNR for heads.heads.0.0.pointwise.stats: 0.691654622554779
Activations SQNR for heads.heads.0.2.stats: 0.6497185826301575
Activations SQNR for heads.heads.1.0.depthwise.stats: 0.4326828718185425
Activations SQNR for heads.heads.1.0.pointwise.stats: 0.6541122198104858
Activations SQNR for heads.heads.1.2.stats: 0.5192033052444458
Activations SQNR for heads.heads.2.0.depthwise.stats: 0.35485032200813293
Activations SQNR for heads.heads.2.0.pointwise.stats: 0.6285597085952759
Activations SQNR for heads.heads.2.2.stats: 1.225141167640686

Here is the code we use for the quantization :

def _quantize_model(cfg, args, base_model, qconfig_name='qnnpack'):
    logger.info("Start quantization")

    base_model.eval()

    def calibrate(model, data_loader):
        batch = None
        with torch.inference_mode():
            for i, batch in tqdm(enumerate(data_loader), total=len(data_loader), desc="Calibrate quantized model"):
                model(batch[DatasetKeys.INPUT].to(args.device))
                # even when calibrating on the whole dataset, 
                # we still obtain poor performances 
                if i >= 2:
                    break

        return batch
    if hasattr(base_model, 'to_exportable_model'):
        model_to_quantize = base_model.to_exportable_model()
    else:
        model_to_quantize = copy.deepcopy(base_model)

    prepare_custom_config_dict = None if not hasattr(base_model, 'FX_CONFIG_DICT') else base_model.FX_CONFIG_DICT
    model_to_quantize.eval()

    qconfig = torch.quantization.get_default_qconfig(qconfig_name)
    # model_to_quantize.qconfig = qconfig
    qconfig_dict = {"": qconfig}
    # prepare
    logger.info("Prepare model for quantization")
    # model_prepared = torch.quantization.prepare(model_to_quantize)
    model_prepared = quantize_fx.prepare_fx(
        copy.deepcopy(model_to_quantize),
        qconfig_dict,
        # example_inputs=(base_model.example_input_array,),
        prepare_custom_config_dict=prepare_custom_config_dict)

    # calibrate
    logger.info("Calibrate")
    dataloader = build_data_loader(cfg.data, cfg.batch_size, cfg.num_workers)
    loader = dataloader.val_dataloader() if cfg.data.val_dataset is not None else dataloader.train_dataloader()
    last_batch = calibrate(model_prepared, loader)

    # only calibration requires GPU
    base_model.example_input_array = base_model.example_input_array.cpu()
    model_prepared.to('cpu')

    # quantize
    logger.info("Quantize")
    model_quantized = quantize_fx.convert_fx(model_prepared).cpu()
    # model_quantized = torch.quantization.convert(model_prepared)
    model_traced = torch.jit.script(model_quantized.eval()).cpu()

    wt_compare_dict = ns.compare_weights(model_to_quantize.cpu().state_dict(), model_quantized.state_dict())
    for key in wt_compare_dict:
        logger.debug(
            f"Weight SQNR for {key}: {SQNR(wt_compare_dict[key]['float'], wt_compare_dict[key]['quantized'].dequantize())}")

    if last_batch is not None:
        logger.debug('#' * 100)
        with torch.inference_mode():
            act_compare_dict = ns.compare_model_outputs(
                model_to_quantize, model_quantized, last_batch[DatasetKeys.INPUT][0].unsqueeze(0).detach())

        for key in act_compare_dict:
            logger.debug(
                f"Activations SQNR for {key}: {SQNR(act_compare_dict[key]['float'][0], act_compare_dict[key]['quantized'][0].dequantize())}")

    return model_traced

What might be the error ?
Thank you :slight_smile:

Hi Valentin,

Your setup looks fine to me. One thing to try is using:

backend = qconfig_name # "qnnpack"
qconfig_mapping = torch.ao.quantization.get_default_qconfig_mapping(backend)
model_prepared = quantize_fx.prepare_fx(model, qconfig_mapping, ...)

By default, the qconfigs in this default qconfig_mapping should quantize both the weights and the activations for linear and conv layers. This should work for MobileNetv3 since it’s a conv heavy model. Feel free to let me know if it doesn’t work.

Best,
-Andrew

Hi andrew,

It seems like I cannot find :

torch.ao.quantization.get_default_qconfig_mapping

in the doc.

Valentin

Hi Valentin,

My apologies. This was added recently in master and may not be in your PyTorch version yet. You can check if it exists in a python shell. If it doesn’t, you can just use torch.ao.quantization.get_default_qconfig_dict() instead and pass that to prepare_fx. This qconfig_dict should have more entries in it than just {"": default_qconfig}.

Best,
-Andrew