FBGEMM with PyTorch Mobile

Is it possible to run a model with fbgemm qconfig on mobile? Or is it x86 only? Simply plugging such model into demo app triggers qnnpack assert here https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/quantized/cpu/qconv_prepack.cpp#L223
It seems like FBGEMM support was disabled by this commit for some reason https://github.com/pytorch/pytorch/commit/6fead9afd4cdc6306fb0e2180ca625160b59ea71

I wasn’t able to get good results from QNNPACK compatible per tensor quantization qconfig. Target metric value relative to fp32 model:
get_default_qconfig('fbgemm') -> 99.8%
get_default_qconfig('qnnpack') -> 58.5%
default_qconfig -> 54.4%
Is there any way to reduce that gap without changing architecture?

FBGEMM is supported only for x86. You can get very good accuracies for qnnpack also.
Please make sure that when you set:

qconfig = torch.quantization.get_default_qconfig('qnnpack')

You also do:

torch.backends.quantized.engine = 'qnnpack'

before running the model.
The poorer accuracy numbers are likely due to FBGEMM saturating for large weight/activation values, due to this issue:

Thanks. With engine set preparation, calibration, and conversion of the model work fine. But evaluation triggers errors like: Error in QNNPACK: failed to create convolution with 0.1966128 input scale, 1.698165 kernel scale, and 0.2075303 output scale: convolution scale 1.608829 is greater or equal to 1.0. The cause seems to be in the SE block implemented via 1x1 convolution that receives 1x1 input. I probably should have used Linear anyway, but maybe it will be useful to someone.

Ok, I’ve managed to get good result from QNNPACK. Maybe torch.backends.quantized.engine should be mentioned somewhere on quantization page?

Great that this worked! We will make sure to mention this on our quantization page. Thanks for the suggestion! cc @raghuramank100