How to perform quantization of a model in PyTorch?

Hello everyone!

I have trained the model MobileNetV2 + SSD Lite in PyTorch from ‘https://github.com/qfgaohao/pytorch-ssd/blob/master/vision/ssd/mobilenet_v2_ssd_lite.py’. Now, I want use it in Raspberry Pi3.

I converted ‘.pth’ model into Caffe2 model through ONNX representation and I got two files: init_net.pb and predict_net.pb for Caffe2 framework.

As far as I know, to accelerate the model on mobile systems such as Rpi3(B/B+) I should use the QNNPACK lib which allows make the low-precision inference using operators with int8 data type.

How to perform quantization of this model?
How can I make low-precision inference using QNNPACK?
Maybe there are some tutorials about it?

Thnx.

Hi @r3krut,

This category is for Glow, which is a different PyTorch backend from Caffe2 (which "natively integrates QNNPACK"). Glow primarily targets neural network accelerators, though it does have a CPU backend and supports automatic profiling + quantization. If you want to use QNNPACK, I believe all you need to do is make sure your model (e.g. predict_net) is using operators such as Int8Conv, Int8FC, etc. and your Caffe2 model would use it.

However, if your network is not quantized and/or you don’t want to install Caffe2 on your Raspberry Pi, you could use try to use Glow to profile your model, quantize it, and then save what we call an ahead-of-time compiled “bundle”, which is just a binary to copy to your Raspberry Pi3 to run (see docs here and here). Note that it may not perform as well as QNNPACK; we are more focused on accelerator backends right now.

Thanks,
Jordan

1 Like

Is it possible to do a quantization of .pb in pytorch and get a quantized .pb directly? Like bazel quantization tools ? Any tutorial will be appreciated!

Thnx for answer @jfix , but I’m a bit confused.
At the moment my model predict_net.pb does not use operators such as Int8Conv and etc.
How do I force my model to use these operations?
Should I change operators such as Conv to Int8Conv manually?
Which file should I change? predict_net.pb or predcit_net.pbtxt?

Example of some op from my predict_net.pbtxt:

op {
  input: "0"
  input: "1"
  output: "497"
  name: ""
  type: "Conv"
  arg {
    name: "strides"
    ints: 2
    ints: 2
  }
  arg {
    name: "pads"
    ints: 1
    ints: 1
    ints: 1
    ints: 1
  }
  arg {
    name: "kernels"
    ints: 3
    ints: 3
  }
  arg {
    name: "group"
    i: 1
  }
  arg {
    name: "dilations"
    ints: 1
    ints: 1
  }
}

Do I have to make changes here or not? Do replace from type: "Conv" on type: "Int8Conv"?
Help me deal with this, please. Thnx.

So it looks like your model is only in float right now. You cannot just simply replace Conv with In8tConv etc. – in order to use quantization you need to know the quantization parameters to use for each operator. In Glow we call this scale and offset; in Caffe2 it’s called Y_scale and Y_zero_point. These are usually based on actual values you expect to flow through your graph.

If you don’t know what the scales/offsets should be (likely the case), one option would be to use Glow’s profiling and quantization to quantize automatically. Like I said in my previous comment:

However, if your network is not quantized and/or you don’t want to install Caffe2 on your Raspberry Pi, you could use try to use Glow to profile your model, quantize it, and then save what we call an ahead-of-time compiled “bundle”, which is just a binary to copy to your Raspberry Pi3 to run (see docs here and here ). Note that it may not perform as well as QNNPACK ; we are more focused on accelerator backends right now.

Again, this is not QNNPACK; Glow does not use it. If you are interested in using QNNPACK and Caffe2 on your Raspberry Pi then you could try asking the question in a separate category.

We always first import Caffe2 or ONNX protos, and generate them into Glow IR, and then profile/quantize the Glow IR from there. However once you’re in Glow IR, there is no current way to generate anything back out to Caffe2/ONNX/PyTorch protos, whether quantized or not.

If this might fit your needs, you can always follow this tutorial to get ONNX or Caffe2 from your PyTorch model, which you can then import to Glow.

Hi,

Is there an example of taking a float32 ONNX graph, quantizing it with Glow and generate Glow IR? Is there a Python interface to Glow profile/quantization?

Thanks,
Sikandar

You can follow the instructions here on how to gather a profile of a model and then quantize the model. You just need an ONNX proto to load into Glow – see the page on Testing here which discusses how to load a model using one of our example proto model Loaders. We have some limtited support for python via PyTorch through the ONNXIFI interface – you can find info here. Otherwise it’s relatively straightforward to run the Testing Loader examples I linked to above after you’ve built in C++ and quantize/run your model.

Hi @jfix, it seems glow ./bin/image-classifier, currently support image classification type model only?
Is there a way to quantize Image Generator type model? or other kind of model? Thanks.

Hi @eric4337, we currently also have a NMT model driver called text-translator, but it’s for pre-unrolled NMT models. We also have a model-runner driver but it’s very simple and just for testing, for models without any non-Constant inputs.

If you want to try other models you need to create your own driver, probably based on tools/loader/ImageClassifier.cpp if you’re interested in image-based models. This mostly would mean you are able to correctly load in and out the inputs/outputs based on their expected shape(s) and datatype(s). Also, depending on the model you may need to add additional operator support.

Once you have those things done you can quantize the model.

Hi,

Look what is available here: https://github.com/opencv/openvino_training_extensions/tree/develop/pytorch_toolkit/nncf.

This is a Quantization Aware Training in PyTorch with ability to export the quantized model to ONNX.