[JIT] Tracing quantized batchnorm issue

#jit #quantization #mobile

Hello everyone,

After I was guided how to deploy quantized models on mobile I’ve decided to give a try to quantized TorchScript model.

What I have is EfficientNet backbone that was quantized with QAT tools and qnnpack config. After quantization I’ve traced it with torch.jit.script and saved for later deploy. Also I have traced model without quantization

I’ve tried both on mobile.
In the second case everything works well. But in the first case I’m obtaining errors:

HWHMA:/data/local/Detectron2Mobile # ./speed_benchmark_torch --model=./ts_models/orig_backbone.pt --input_dims="1,3,512,768" --input_type=float --warmup=10 --iter=10                                      
Starting benchmark.
Running warmup runs.
Main runs.
Main run finished. Milliseconds per iter: 2146.44. Iters per second: 0.465887
/speed_benchmark_torch --model=./ts_models/quant_backbone.pt --input_dims="1,3,512,768" --input_type=float --warmup=10 --iter=10                                                                          <
terminating with uncaught exception of type torch::jit::ErrorReport: 
Unknown builtin op: quantized::batch_norm.
Here are some suggestions: 

The original call is:
Serialized   File "code/__torch__/torch/nn/quantized/modules/batchnorm.py", line 14
    _1 = self.running_mean
    _2 = self.bias
    input = ops.quantized.batch_norm(argument_1, self.weight, _2, _1, _0, 1.0000000000000001e-05, 0.44537684321403503, 129)
            ~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
    return input


There’s no op quantized::batch_norm but there are quantized::batch_norm2d/3d. In the working case traced code looks the following way:

def forward(self,
    argument_1: Tensor) -> Tensor:
  _0 = self.running_var
  _1 = self.running_mean
  _2 = self.bias
  input = torch.batch_norm(argument_1, self.weight, _2, _1, _0, False, 0.10000000000000001, 1.0000000000000001e-05, True)
  return input

What is happening in the not working case? Is it wrong substitution while tracing or there’s an issue with quantized::batch_norm?


I’ve tried to convert traced graph to Caffe2. And have run into same problem.

Traced with torch.jit.script model cannot be converted to Caffe2 because of batchnorm.

model = torch.jit.load(buf)
f = io.BytesIO()
torch.onnx.export(model, x, f, example_outputs=outputs,

Where the model is traced backbone

~/anaconda2/envs/pytorch-gpu/lib/python3.7/site-packages/torch/onnx/utils.py in _optimize_graph(graph, operator_export_type, _disable_torch_constant_prop, fixed_batch_size, params_dict)
    157             torch.onnx.symbolic_helper._quantized_ops.clear()
    158             # Unpack quantized weights for conv and linear ops and insert into graph.
--> 159             torch._C._jit_pass_onnx_unpack_quantized_weights(graph, params_dict)
    161             # Insert permutes before and after each conv op to ensure correct order.

RuntimeError: false INTERNAL ASSERT FAILED at /opt/conda/conda-bld/pytorch_1586761698468/work/torch/csrc/jit/passes/onnx/unpack_quantized_weights.cpp:99, please report a bug to PyTorch. Unrecognized quantized operator while trying to compute q_scale for operator quantized::batch_norm

For the first question - are you using nightly build of pytorch? The op name was updated in https://github.com/pytorch/pytorch/pull/36494 to quantized.batch_norm2d to make it more consistent with the implementation. You might have to re-do the QAT convert with the same pytorch build to make sure you get the same op name.

For the second question - We currently do not have the quantized pytorch to caffe2 conversion flow working for the quantized::batch_norm2d operator. Mainly due to the fact that caffe2 quantized ops currently don’t have this operator.

@supriyar thank you!

There was a version right below that PR. As I can see now quantized::batchnorm2d appeared and the model works on mobile device.

@supriyar would you mind to discuss how to convert models for now. Is it possible to exclude bn operators and and reach same functionality only with int8_conv_op_relu?

As it was done in this example model https://github.com/caffe2/models/tree/master/resnet50_quantized

UPD: I think the answer is to use fused modules from intrinsic

1 Like