Backend/ReduceRange info in torchscript models

saurabh-shandilya · July 28, 2022, 5:25pm

Looking at a quantized torchscript model (using jit.trace), is there a way to know if model was quantized for fbgemm/qnnpack backend? Reduce_range is another parameter which affects how the scales/offsets are interpreted by a “new” backend. Is there a way to know reduce_range was on or off at the time of quantization? This information can be node based on graph based.

@raghuramank100

andrewor · August 1, 2022, 2:07pm

Hi Saurabh,

If you print out the prepared model graph you should be able to see what observers were inserted. I don’t think there’s a way to check what backend the model was quantized on specifically, but you could check what the quant_min and quant_max values were set to in the observers this way. The reduce_range attribute is now deprecated and replaced with quant_min and quant_max. Let me know if that helps.

Best,
-Andrew

saurabh-shandilya · August 2, 2022, 9:57pm

Hi @andrewor,

I can work with quant_min/quant_max. But I can’t find that information either. I was specifically looking for an api which can get me this information programmatically.

Let’s consider the example below for pre-quantized model from torchvision-

import torchvision
import torch
model= torchvision.models.quantization.googlenet(pretrained=True, quantize=True)
print(model)

This prints

QuantizableGoogLeNet(
(conv1): QuantizableBasicConv2d(
(conv): QuantizedConvReLU2d(3, 64, kernel_size=(7, 7), stride=(2, 2), scale=0.08655554801225662, zero_point=0, padding=(3, 3))
(bn): Identity()
(relu): Identity()
)
(maxpool1): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=True)
(conv2): QuantizableBasicConv2d(
(conv): QuantizedConvReLU2d(64, 64, kernel_size=(1, 1), stride=(1, 1), scale=0.05618245154619217, zero_point=0)
(bn): Identity()
(relu): Identity()
)
(conv3): QuantizableBasicConv2d(
(conv): QuantizedConvReLU2d(64, 192, kernel_size=(3, 3), stride=(1, 1), scale=0.05060958117246628, zero_point=0, padding=(1, 1))
(bn): Identity()
(relu): Identity()
)

It has no information about quant_min/quant_max or reduce_range or engine (assuming it was quantized with default setting for corresponding backend).

input_shape= [1, 3, 224, 224]
x = torch.randn(input_shape)
model.eval()
traced_mod = torch.jit.trace(model, x)
print(traced_mod)

With jit traced torchscript model a print doesn’t give scale/offset information either however there are other ways to get scale/offset but not sure about quant_min/max value or engine.

QuantizableGoogLeNet(
original_name=QuantizableGoogLeNet
(conv1): QuantizableBasicConv2d(
original_name=QuantizableBasicConv2d
(conv): RecursiveScriptModule(original_name=ConvReLU2d)
(bn): Identity(original_name=Identity)
(relu): Identity(original_name=Identity)
)