Recreating FX GRAPH QUANTIZATION on MobileNetV3, "NotImplementedError"

Hello everyone,

I am trying to quantize a MobileNetV3 model with the fx graph quantization. https://pytorch.org/tutorials/prototype/fx_graph_mode_ptq_static.html

The quantization itself worked, since when I print the “quantized model” it prints out:

GraphModule(
(features): Module(
(0): Module(
(0): QuantizedConv2d(3, 16, kernel_size=(3, 3), stride=(2, 2), scale=0.09166989475488663, zero_point=64, padding=(1, 1))
(2): QuantizedHardswish()
)
(1): Module(
(block): Module(
(0): Module(
(0): QuantizedConvReLU2d(16, 16, kernel_size=(3, 3), stride=(1, 1), scale=0.05154227465391159, zero_point=0, padding=(1, 1), groups=16)
)
(1): Module(
(0): QuantizedConv2d(16, 16, kernel_size=(1, 1), stride=(1, 1), scale=0.09911715239286423, zero_point=58)
(2): Identity()
)
)
)
(2): Module(
(block): Module(
(0): Module(
(0): QuantizedConvReLU2d(16, 64, kernel_size=(1, 1), stride=(1, 1), scale=0.0551120825111866, zero_point=0)
)
(1): Module(
(0): QuantizedConvReLU2d(64, 64, kernel_size=(3, 3), stride=(2, 2), scale=0.055621854960918427, zero_point=0, padding=(1, 1), groups=64)
)
(2): Module(
(0): QuantizedConv2d(64, 24, kernel_size=(1, 1), stride=(1, 1), scale=0.09501516073942184, zero_point=66)
(2): Identity()
)
)
)
(3): Module(
(block): Module(
(0): Module(
(0): QuantizedConvReLU2d(24, 72, kernel_size=(1, 1), stride=(1, 1), scale=0.05194235220551491, zero_point=0)
)
(1): Module(
(0): QuantizedConvReLU2d(72, 72, kernel_size=(3, 3), stride=(1, 1), scale=0.05939812585711479, zero_point=0, padding=(1, 1), groups=72)
)
(2): Module(
(0): QuantizedConv2d(72, 24, kernel_size=(1, 1), stride=(1, 1), scale=0.09852656722068787, zero_point=66)
(2): Identity()
)
)
)
(4): Module(
(block): Module(
(0): Module(
(0): QuantizedConvReLU2d(24, 72, kernel_size=(1, 1), stride=(1, 1), scale=0.052594587206840515, zero_point=0)
)
(1): Module(
(0): QuantizedConvReLU2d(72, 72, kernel_size=(5, 5), stride=(2, 2), scale=0.047535136342048645, zero_point=0, padding=(2, 2), groups=72)
)
(2): Module(
(fc1): QuantizedConvReLU2d(72, 24, kernel_size=(1, 1), stride=(1, 1), scale=0.030134592205286026, zero_point=0)
(fc2): QuantizedConv2d(24, 72, kernel_size=(1, 1), stride=(1, 1), scale=0.038337405771017075, zero_point=74)
)
(3): Module(
(0): QuantizedConv2d(72, 40, kernel_size=(1, 1), stride=(1, 1), scale=0.09368766099214554, zero_point=68)
(2): Identity()
)
)
)
(5): Module(
(block): Module(
(0): Module(
(0): QuantizedConvReLU2d(40, 120, kernel_size=(1, 1), stride=(1, 1), scale=0.04279119148850441, zero_point=0)
)
(1): Module(
(0): QuantizedConvReLU2d(120, 120, kernel_size=(5, 5), stride=(1, 1), scale=0.043220121413469315, zero_point=0, padding=(2, 2), groups=120)
)
(2): Module(
(fc1): QuantizedConvReLU2d(120, 32, kernel_size=(1, 1), stride=(1, 1), scale=0.03446542099118233, zero_point=0)
(fc2): QuantizedConv2d(32, 120, kernel_size=(1, 1), stride=(1, 1), scale=0.046296607702970505, zero_point=61)
)
(3): Module(
(0): QuantizedConv2d(120, 40, kernel_size=(1, 1), stride=(1, 1), scale=0.0773073136806488, zero_point=66)
(2): Identity()
)
)
)
(6): Module(
(block): Module(
(0): Module(
(0): QuantizedConvReLU2d(40, 120, kernel_size=(1, 1), stride=(1, 1), scale=0.0431692935526371, zero_point=0)
)
(1): Module(
(0): QuantizedConvReLU2d(120, 120, kernel_size=(5, 5), stride=(1, 1), scale=0.046419981867074966, zero_point=0, padding=(2, 2), groups=120)
)
(2): Module(
(fc1): QuantizedConvReLU2d(120, 32, kernel_size=(1, 1), stride=(1, 1), scale=0.02330135554075241, zero_point=0)
(fc2): QuantizedConv2d(32, 120, kernel_size=(1, 1), stride=(1, 1), scale=0.03669281676411629, zero_point=54)
)
(3): Module(
(0): QuantizedConv2d(120, 40, kernel_size=(1, 1), stride=(1, 1), scale=0.07971568405628204, zero_point=61)
(2): Identity()
)
)
)
(7): Module(
(block): Module(
(0): Module(
(0): QuantizedConv2d(40, 240, kernel_size=(1, 1), stride=(1, 1), scale=0.08144847303628922, zero_point=66)
(2): QuantizedHardswish()
)
(1): Module(
(0): QuantizedConv2d(240, 240, kernel_size=(3, 3), stride=(2, 2), scale=0.09795959293842316, zero_point=61, padding=(1, 1), groups=240)
(2): QuantizedHardswish()
)
(2): Module(
(0): QuantizedConv2d(240, 80, kernel_size=(1, 1), stride=(1, 1), scale=0.09617093950510025, zero_point=64)
(2): Identity()
)
)
)
(8): Module(
(block): Module(
(0): Module(
(0): QuantizedConv2d(80, 200, kernel_size=(1, 1), stride=(1, 1), scale=0.09652244299650192, zero_point=62)
(2): QuantizedHardswish()
)
(1): Module(
(0): QuantizedConv2d(200, 200, kernel_size=(3, 3), stride=(1, 1), scale=0.10839337855577469, zero_point=60, padding=(1, 1), groups=200)
(2): QuantizedHardswish()
)
(2): Module(
(0): QuantizedConv2d(200, 80, kernel_size=(1, 1), stride=(1, 1), scale=0.09403073787689209, zero_point=62)
(2): Identity()
)
)
)
(9): Module(
(block): Module(
(0): Module(
(0): QuantizedConv2d(80, 184, kernel_size=(1, 1), stride=(1, 1), scale=0.08897814154624939, zero_point=62)
(2): QuantizedHardswish()
)
(1): Module(
(0): QuantizedConv2d(184, 184, kernel_size=(3, 3), stride=(1, 1), scale=0.10986955463886261, zero_point=64, padding=(1, 1), groups=184)
(2): QuantizedHardswish()
)
(2): Module(
(0): QuantizedConv2d(184, 80, kernel_size=(1, 1), stride=(1, 1), scale=0.09475481510162354, zero_point=67)
(2): Identity()
)
)
)
(10): Module(
(block): Module(
(0): Module(
(0): QuantizedConv2d(80, 184, kernel_size=(1, 1), stride=(1, 1), scale=0.09242968261241913, zero_point=66)
(2): QuantizedHardswish()
)
(1): Module(
(0): QuantizedConv2d(184, 184, kernel_size=(3, 3), stride=(1, 1), scale=0.10907693207263947, zero_point=59, padding=(1, 1), groups=184)
(2): QuantizedHardswish()
)
(2): Module(
(0): QuantizedConv2d(184, 80, kernel_size=(1, 1), stride=(1, 1), scale=0.10109627991914749, zero_point=65)
(2): Identity()
)
)
)
(11): Module(
(block): Module(
(0): Module(
(0): QuantizedConv2d(80, 480, kernel_size=(1, 1), stride=(1, 1), scale=0.09138453751802444, zero_point=62)
(2): QuantizedHardswish()
)
(1): Module(
(0): QuantizedConv2d(480, 480, kernel_size=(3, 3), stride=(1, 1), scale=0.10354351252317429, zero_point=61, padding=(1, 1), groups=480)
(2): QuantizedHardswish()
)
(2): Module(
(fc1): QuantizedConvReLU2d(480, 120, kernel_size=(1, 1), stride=(1, 1), scale=0.0344068706035614, zero_point=0)
(fc2): QuantizedConv2d(120, 480, kernel_size=(1, 1), stride=(1, 1), scale=0.037343572825193405, zero_point=62)
)
(3): Module(
(0): QuantizedConv2d(480, 112, kernel_size=(1, 1), stride=(1, 1), scale=0.08420202136039734, zero_point=62)
(2): Identity()
)
)
)
(12): Module(
(block): Module(
(0): Module(
(0): QuantizedConv2d(112, 672, kernel_size=(1, 1), stride=(1, 1), scale=0.08766956627368927, zero_point=67)
(2): QuantizedHardswish()
)
(1): Module(
(0): QuantizedConv2d(672, 672, kernel_size=(3, 3), stride=(1, 1), scale=0.09654679149389267, zero_point=61, padding=(1, 1), groups=672)
(2): QuantizedHardswish()
)
(2): Module(
(fc1): QuantizedConvReLU2d(672, 168, kernel_size=(1, 1), stride=(1, 1), scale=0.03521481901407242, zero_point=0)
(fc2): QuantizedConv2d(168, 672, kernel_size=(1, 1), stride=(1, 1), scale=0.04141794145107269, zero_point=64)
)
(3): Module(
(0): QuantizedConv2d(672, 112, kernel_size=(1, 1), stride=(1, 1), scale=0.08542641252279282, zero_point=66)
(2): Identity()
)
)
)
(13): Module(
(block): Module(
(0): Module(
(0): QuantizedConv2d(112, 672, kernel_size=(1, 1), stride=(1, 1), scale=0.08445318043231964, zero_point=64)
(2): QuantizedHardswish()
)
(1): Module(
(0): QuantizedConv2d(672, 672, kernel_size=(5, 5), stride=(2, 2), scale=0.0685504674911499, zero_point=67, padding=(2, 2), groups=672)
(2): QuantizedHardswish()
)
(2): Module(
(fc1): QuantizedConvReLU2d(672, 168, kernel_size=(1, 1), stride=(1, 1), scale=0.04330335929989815, zero_point=0)
(fc2): QuantizedConv2d(168, 672, kernel_size=(1, 1), stride=(1, 1), scale=0.05540220066905022, zero_point=68)
)
(3): Module(
(0): QuantizedConv2d(672, 160, kernel_size=(1, 1), stride=(1, 1), scale=0.05899979919195175, zero_point=65)
(2): Identity()
)
)
)
(14): Module(
(block): Module(
(0): Module(
(0): QuantizedConv2d(160, 960, kernel_size=(1, 1), stride=(1, 1), scale=0.06171604245901108, zero_point=65)
(2): QuantizedHardswish()
)
(1): Module(
(0): QuantizedConv2d(960, 960, kernel_size=(5, 5), stride=(1, 1), scale=0.05438883602619171, zero_point=59, padding=(2, 2), groups=960)
(2): QuantizedHardswish()
)
(2): Module(
(fc1): QuantizedConvReLU2d(960, 240, kernel_size=(1, 1), stride=(1, 1), scale=0.0247460026293993, zero_point=0)
(fc2): QuantizedConv2d(240, 960, kernel_size=(1, 1), stride=(1, 1), scale=0.027777383103966713, zero_point=69)
)
(3): Module(
(0): QuantizedConv2d(960, 160, kernel_size=(1, 1), stride=(1, 1), scale=0.06441330909729004, zero_point=60)
(2): Identity()
)
)
)
(15): Module(
(block): Module(
(0): Module(
(0): QuantizedConv2d(160, 960, kernel_size=(1, 1), stride=(1, 1), scale=0.05954812839627266, zero_point=60)
(2): QuantizedHardswish()
)
(1): Module(
(0): QuantizedConv2d(960, 960, kernel_size=(5, 5), stride=(1, 1), scale=0.05335007980465889, zero_point=68, padding=(2, 2), groups=960)
(2): QuantizedHardswish()
)
(2): Module(
(fc1): QuantizedConvReLU2d(960, 240, kernel_size=(1, 1), stride=(1, 1), scale=0.02065999060869217, zero_point=0)
(fc2): QuantizedConv2d(240, 960, kernel_size=(1, 1), stride=(1, 1), scale=0.024499310180544853, zero_point=64)
)
(3): Module(
(0): QuantizedConv2d(960, 160, kernel_size=(1, 1), stride=(1, 1), scale=0.05633355677127838, zero_point=66)
(2): Identity()
)
)
)
(16): Module(
(0): QuantizedConv2d(160, 960, kernel_size=(1, 1), stride=(1, 1), scale=0.05573510006070137, zero_point=59)
(2): QuantizedHardswish()
)
)
(avgpool): AdaptiveAvgPool2d(output_size=1)
(classifier): Module(
(0): QuantizedLinear(in_features=960, out_features=1280, scale=0.07583604007959366, zero_point=53, qscheme=torch.per_channel_affine)
(1): QuantizedHardswish()
(2): Dropout(p=0.2, inplace=True)
(3): QuantizedLinear(in_features=1280, out_features=1000, scale=0.3153918385505676, zero_point=18, qscheme=torch.per_channel_affine)
)
)

Now I want to look at the top1 and top5 accuracy.

top1, top5 = evaluate(quantized_model, criterion, data_loader_test)

Then I get the following Error:

**---------------------------------------------------------------------------
NotImplementedError Could not run ‘aten::hardsigmoid.out’ with arguments from the ‘QuantizedCPU’ backend. This could be because the operator doesn’t exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit xxx for possible resolutions. ‘aten::hardsigmoid.out’ is only available for these backends: [CPU, CUDA, Meta, BackendSelect, Named, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, UNKNOWN_TENSOR_TYPE_ID, AutogradMLC, AutogradHPU, AutogradNestedTensor, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, Tracer, Autocast, Batched, VmapMode].

Thanks for reporting! This is a bug, I filed FX graph mode quantization broken for torchvision MobileNetV3 · Issue #68250 · pytorch/pytorch · GitHub for this and someone on our team will take a look ASAP.