Yes, that’s my question!
I wanna quantize my pytorch model directly, instead of ONNX or Caffe2 model.
Does that work?
@weiwei_lee You can do this e.g. by exporting from PyTorch into ONNX, and then load the ONNX proto representation of the model into Glow – see this tutorial. This of course is dependent on what we support in our ONNX importer and in Glow, so not everything you export will work right out of the box.
Ok, Thx @jfix
I successfully get profile.yaml and use it to quantize model
however, how to save “quantized model” ?
Yes, that is what I want.
Is there any document explain more detail about quantization flow and rule in inference stage?
Because I am curious that in glow,
- it load model and yaml to transform weight of model into int8?
- Input to each layer is float or int8?
- is the multiplication rule like tflite using gemmlowp to do int8*int8?
Yes, please take a look at all of our docs – I think they mostly seem that they will answer your questions. Here is the doc on quantization. You gather a profile on the floating point graph with whatever inputs you want, it dumps the yaml file, then you load it back in and it will quantize the graph given the profile you gathered. Assuming the CPU backend supports the node in quantized form, all inputs will be int8. Otherwise conversions to/from int8 will be performed. You can look at the graph that is generated to see what is and is not quantized, via the command line option
Thanks for your clearly explanation!
I am trying to quantize another pytorch model with the approach described in this post. I have glow build and model converted to .onnx , what should I put as the -model-input-name? when i do
./bin/image-classifier tests/images/imagenet/*.png -image-mode=0to1 -m=‘path/to/my/model’ -model-input-name=gpu_0/data -dump-profile=“profile.yaml”
I got error file: ~/glow/lib/Importer/ProtobufLoader.cpp line: 33 message: There is no tensor registered with name 0.
I assume it’s the input name was incorrect?
Yes, it’s probably due to
-model-input-name. You’ll need to take a look at the proto itself to find out the name of the external input that is the image itself, which is the input to the first operator of the model. In most of our image classification models, this is
gpu_0/data. You can see some examples in our image classifier run script here.
@jfix Thank you for the quick respond! How do I get the input name of the model? I have tried print(model) and for name, param in model.named_parameters(): print (name, param.data.shape) , both start from the actual layers, not including the input. sorry it’s a basic question.
You could use something like Netron to view your protobuf, and view what the very first operator’s input is (see the image below, for the very start of a Caffe2 Resnet50 model – you’d use
gpu_0/data). Otherwise you should be able to just inspect the protobuf text of the model to see what the input name of the first operator is – it should be external input.
Thank you for your help @jfix! That works wonders!
However got an error eventually glow/lib/Importer/ONNXModelLoader.cpp line: 896 error code: MODEL_LOADER_UNSUPPORTED_OPERATOR message: Failed to load operator. I guess it means some of the operations in my model are supported for quantization at this point?
Yeah, some op is unsupported – what options were you using for the
image-classifier? Was this when trying to get the profile (
-dump-profile), or load the profile for quantization (
-load-profile)? Have you tried just running it in fp32 (i.e. without the mentioned options)?
MODEL_LOADER_UNSUPPORTED_OPERATOR, the operator is unsupported in fp32 too, and we’d need to add support to the importer for it. We need to improve our error messages here – in the meantime could you just add a simple print statement just before the
ONNXModelLoader::LoadOperator() to print the
typeName to see what it is reporting as unsupported.
Also, feel free to open an issue for supporting whatever op it is on Github!
Thanks @jfix for clarifying it! I got the error when doing ./bin/image-classifier tests/images/imagenet/*.png -image-mode=0to1 -m=‘path/to/my/model’ -model-input-name=gpu_0/data -dump-profile=“profile.yaml” .
The model I am trying to quantize with is mobilefacenet, made with grouped convolutions and dense blocks.
Got it, cool. Well like I said, if you find out what op is being reported as unsupported please open an issue with it on GH.
Look what is available here: https://github.com/opencv/openvino_training_extensions/tree/develop/pytorch_toolkit/nncf.
This is a Quantization Aware Training in PyTorch with ability to export the quantized model to ONNX.
There are many results there including ResNet-50 ready to use config for quantization.
Is it possible to save the quantized model as a readable file? E.g. a protobuf file where I can see the scales and zero points of each layer
We do have an ONNX exporter, but I’m not sure how nicely it handles quantized nodes. If you’re interested in using this I would create an issue on Github asking about this, tagging @putivsky.
That said if your goal is to just look at the scales and zero points, you can do that by quantizing the model and then dumping a dot file of the DAG to see all of them, e.g. if using one of our Loaders (e.g.
When i am using image-classifier with passing only one image.png as input it works fine and generates profile file,but if i give more than one image as input it gives we error.
So is the profile file obtained will be a correct one with only “one image” and i can processed with it further or not ? What difference it makes?
Below is the Command use and error mentioned.
Thanks for reading .
Command used :-
“D:\IMXRT\nxp\Glow\bin\ image-classifier.exe images\0_1009.png images\1_1008.png images\2_1065.png images\3_1020.png images\4_1059.png images\5_1087.png images\6_1099.png images\7_1055.png images\8_1026.png images\9_1088.png -image-mode=0to1 -image-layout=NCHW -image-channel-order=BGR -model=models\mnist_7.onnx -model-input-name=Input3 -dump-profile=profile.yml”
Running 1 thread(s).
name : Times212_reshape0
Input : float<10 x 16 x 4 x 4>
Dims : [1, 256]
Layout : *
users : 1
Result : float<1 x 256>
Reshape into a different size
LHS Equal RHS with:
WARNING: Logging before InitGoogleLogging() is written to STDERR
F0603 14:32:10.634238 15296 Error.cpp:119] exitOnError(Error) got an unexpected ErrorValue:
Error message: Function verification failed.
The number of images you provide in the command are used to determine the batch size to use when compiling the model. Your command line passes 10 images, so it tries to use a batch size of 10.
Some models have specific ops that assume/require a specific batch size, e.g. if there’s a Resize op in the model. So if the original model expected batch size 1 and the model has such an op that assumed batch size of 1, then you have to use batch size 1 as well in Glow. I’m assuming this is what is happening for your model.
If you are OK with using batch size of 1, then you can pass
-minibatch=1 in your command. This will tell Glow to compile with a batch size of 1 no matter the number of images passed in the command, and then it will run them one by one. This would need to be used for both