How to perform quantization of a model in PyTorch?

(Mikhail) #1

Hello everyone!

I have trained the model MobileNetV2 + SSD Lite in PyTorch from ‘https://github.com/qfgaohao/pytorch-ssd/blob/master/vision/ssd/mobilenet_v2_ssd_lite.py’. Now, I want use it in Raspberry Pi3.

I converted ‘.pth’ model into Caffe2 model through ONNX representation and I got two files: init_net.pb and predict_net.pb for Caffe2 framework.

As far as I know, to accelerate the model on mobile systems such as Rpi3(B/B+) I should use the QNNPACK lib which allows make the low-precision inference using operators with int8 data type.

How to perform quantization of this model?
How can I make low-precision inference using QNNPACK?
Maybe there are some tutorials about it?

Thnx.

(Jordan Fix) #2

Hi @r3krut,

This category is for Glow, which is a different PyTorch backend from Caffe2 (which "natively integrates QNNPACK"). Glow primarily targets neural network accelerators, though it does have a CPU backend and supports automatic profiling + quantization. If you want to use QNNPACK, I believe all you need to do is make sure your model (e.g. predict_net) is using operators such as Int8Conv, Int8FC, etc. and your Caffe2 model would use it.

However, if your network is not quantized and/or you don’t want to install Caffe2 on your Raspberry Pi, you could use try to use Glow to profile your model, quantize it, and then save what we call an ahead-of-time compiled “bundle”, which is just a binary to copy to your Raspberry Pi3 to run (see docs here and here). Note that it may not perform as well as QNNPACK; we are more focused on accelerator backends right now.

Thanks,
Jordan

1 Like
#3

Is it possible to do a quantization of .pb in pytorch and get a quantized .pb directly? Like bazel quantization tools ? Any tutorial will be appreciated!

(Mikhail) #4

Thnx for answer @jfix , but I’m a bit confused.
At the moment my model predict_net.pb does not use operators such as Int8Conv and etc.
How do I force my model to use these operations?
Should I change operators such as Conv to Int8Conv manually?
Which file should I change? predict_net.pb or predcit_net.pbtxt?

Example of some op from my predict_net.pbtxt:

op {
  input: "0"
  input: "1"
  output: "497"
  name: ""
  type: "Conv"
  arg {
    name: "strides"
    ints: 2
    ints: 2
  }
  arg {
    name: "pads"
    ints: 1
    ints: 1
    ints: 1
    ints: 1
  }
  arg {
    name: "kernels"
    ints: 3
    ints: 3
  }
  arg {
    name: "group"
    i: 1
  }
  arg {
    name: "dilations"
    ints: 1
    ints: 1
  }
}

Do I have to make changes here or not? Do replace from type: "Conv" on type: "Int8Conv"?
Help me deal with this, please. Thnx.

(Jordan Fix) #5

So it looks like your model is only in float right now. You cannot just simply replace Conv with In8tConv etc. – in order to use quantization you need to know the quantization parameters to use for each operator. In Glow we call this scale and offset; in Caffe2 it’s called Y_scale and Y_zero_point. These are usually based on actual values you expect to flow through your graph.

If you don’t know what the scales/offsets should be (likely the case), one option would be to use Glow’s profiling and quantization to quantize automatically. Like I said in my previous comment:

However, if your network is not quantized and/or you don’t want to install Caffe2 on your Raspberry Pi, you could use try to use Glow to profile your model, quantize it, and then save what we call an ahead-of-time compiled “bundle”, which is just a binary to copy to your Raspberry Pi3 to run (see docs here and here ). Note that it may not perform as well as QNNPACK ; we are more focused on accelerator backends right now.

Again, this is not QNNPACK; Glow does not use it. If you are interested in using QNNPACK and Caffe2 on your Raspberry Pi then you could try asking the question in a separate category.

(Jordan Fix) #6

We always first import Caffe2 or ONNX protos, and generate them into Glow IR, and then profile/quantize the Glow IR from there. However once you’re in Glow IR, there is no current way to generate anything back out to Caffe2/ONNX/PyTorch protos, whether quantized or not.

If this might fit your needs, you can always follow this tutorial to get ONNX or Caffe2 from your PyTorch model, which you can then import to Glow.

(symashayak) #7

Hi,

Is there an example of taking a float32 ONNX graph, quantizing it with Glow and generate Glow IR? Is there a Python interface to Glow profile/quantization?

Thanks,
Sikandar

(Jordan Fix) #8

You can follow the instructions here on how to gather a profile of a model and then quantize the model. You just need an ONNX proto to load into Glow – see the page on Testing here which discusses how to load a model using one of our example proto model Loaders. We have some limtited support for python via PyTorch through the ONNXIFI interface – you can find info here. Otherwise it’s relatively straightforward to run the Testing Loader examples I linked to above after you’ve built in C++ and quantize/run your model.