How to perform quantization of a model in PyTorch?

r3krut · March 21, 2019, 10:08pm

Hello everyone!

I have trained the model MobileNetV2 + SSD Lite in PyTorch from ‘https://github.com/qfgaohao/pytorch-ssd/blob/master/vision/ssd/mobilenet_v2_ssd_lite.py’. Now, I want use it in Raspberry Pi3.

I converted ‘.pth’ model into Caffe2 model through ONNX representation and I got two files: init_net.pb and predict_net.pb for Caffe2 framework.

As far as I know, to accelerate the model on mobile systems such as Rpi3(B/B+) I should use the QNNPACK lib which allows make the low-precision inference using operators with int8 data type.

How to perform quantization of this model?
How can I make low-precision inference using QNNPACK?
Maybe there are some tutorials about it?

Thnx.

jfix · March 25, 2019, 4:21pm

Hi @r3krut,

This category is for Glow, which is a different PyTorch backend from Caffe2 (which "natively integrates QNNPACK"). Glow primarily targets neural network accelerators, though it does have a CPU backend and supports automatic profiling + quantization. If you want to use QNNPACK, I believe all you need to do is make sure your model (e.g. predict_net) is using operators such as Int8Conv, Int8FC, etc. and your Caffe2 model would use it.

However, if your network is not quantized and/or you don’t want to install Caffe2 on your Raspberry Pi, you could use try to use Glow to profile your model, quantize it, and then save what we call an ahead-of-time compiled “bundle”, which is just a binary to copy to your Raspberry Pi3 to run (see docs here and here). Note that it may not perform as well as QNNPACK; we are more focused on accelerator backends right now.

Thanks,
Jordan

Stella · March 30, 2019, 4:29am

Is it possible to do a quantization of .pb in pytorch and get a quantized .pb directly? Like bazel quantization tools ? Any tutorial will be appreciated!

r3krut · April 4, 2019, 4:26pm

Thnx for answer @jfix , but I’m a bit confused.
At the moment my model predict_net.pb does not use operators such as Int8Conv and etc.
How do I force my model to use these operations?
Should I change operators such as Conv to Int8Conv manually?
Which file should I change? predict_net.pb or predcit_net.pbtxt?

Example of some op from my predict_net.pbtxt:

op {
  input: "0"
  input: "1"
  output: "497"
  name: ""
  type: "Conv"
  arg {
    name: "strides"
    ints: 2
    ints: 2
  }
  arg {
    name: "pads"
    ints: 1
    ints: 1
    ints: 1
    ints: 1
  }
  arg {
    name: "kernels"
    ints: 3
    ints: 3
  }
  arg {
    name: "group"
    i: 1
  }
  arg {
    name: "dilations"
    ints: 1
    ints: 1
  }
}

Do I have to make changes here or not? Do replace from type: "Conv" on type: "Int8Conv"?
Help me deal with this, please. Thnx.

jfix · April 5, 2019, 6:19am

So it looks like your model is only in float right now. You cannot just simply replace Conv with In8tConv etc. – in order to use quantization you need to know the quantization parameters to use for each operator. In Glow we call this scale and offset; in Caffe2 it’s called Y_scale and Y_zero_point. These are usually based on actual values you expect to flow through your graph.

If you don’t know what the scales/offsets should be (likely the case), one option would be to use Glow’s profiling and quantization to quantize automatically. Like I said in my previous comment:

However, if your network is not quantized and/or you don’t want to install Caffe2 on your Raspberry Pi, you could use try to use Glow to profile your model, quantize it, and then save what we call an ahead-of-time compiled “bundle”, which is just a binary to copy to your Raspberry Pi3 to run (see docs here and here ). Note that it may not perform as well as QNNPACK ; we are more focused on accelerator backends right now.

Again, this is not QNNPACK; Glow does not use it. If you are interested in using QNNPACK and Caffe2 on your Raspberry Pi then you could try asking the question in a separate category.

jfix · April 5, 2019, 6:24am

We always first import Caffe2 or ONNX protos, and generate them into Glow IR, and then profile/quantize the Glow IR from there. However once you’re in Glow IR, there is no current way to generate anything back out to Caffe2/ONNX/PyTorch protos, whether quantized or not.

If this might fit your needs, you can always follow this tutorial to get ONNX or Caffe2 from your PyTorch model, which you can then import to Glow.

symashayak · April 26, 2019, 6:25pm

Hi,

Is there an example of taking a float32 ONNX graph, quantizing it with Glow and generate Glow IR? Is there a Python interface to Glow profile/quantization?

Thanks,
Sikandar

jfix · April 28, 2019, 3:36am

You can follow the instructions here on how to gather a profile of a model and then quantize the model. You just need an ONNX proto to load into Glow – see the page on Testing here which discusses how to load a model using one of our example proto model Loaders. We have some limtited support for python via PyTorch through the ONNXIFI interface – you can find info here. Otherwise it’s relatively straightforward to run the Testing Loader examples I linked to above after you’ve built in C++ and quantize/run your model.

eric4337 · July 17, 2019, 8:19am

Hi @jfix, it seems glow ./bin/image-classifier, currently support image classification type model only?
Is there a way to quantize Image Generator type model? or other kind of model? Thanks.

jfix · July 18, 2019, 12:47am

Hi @eric4337, we currently also have a NMT model driver called text-translator, but it’s for pre-unrolled NMT models. We also have a model-runner driver but it’s very simple and just for testing, for models without any non-Constant inputs.

If you want to try other models you need to create your own driver, probably based on tools/loader/ImageClassifier.cpp if you’re interested in image-based models. This mostly would mean you are able to correctly load in and out the inputs/outputs based on their expected shape(s) and datatype(s). Also, depending on the model you may need to add additional operator support.

Once you have those things done you can quantize the model.

AlexKoff88 · August 6, 2019, 4:21pm

Hi,

Look what is available here: https://github.com/opencv/openvino_training_extensions/tree/develop/pytorch_toolkit/nncf.

This is a Quantization Aware Training in PyTorch with ability to export the quantized model to ONNX.

blueskywwc · July 23, 2020, 2:29am

Hi,
Can you export the quantized model to onnx? thanks!