TorchScript and eager mode mix during ONNX export

zetyquickly · May 4, 2020, 12:27pm

Hi, everyone,

I am struggling with exporting quantized model from PyTorch to Caffe2. For now it is clear that previously quantized model should be traced then exported via ONNX to Caffe2. But what if it is impossible (for some reason) to trace some part of the network which might be non-quantized? Could we possibly mix TorchScript module and eager mode modules in ONNX export?

Also I have other question but it is not related to the topic name. I’ve also tried to convert whole traced model but ONNX throws an errors, that I unable to understand. The steps are the following:

I’ve traced quantized PyTorch model
Called torch.onnx.export with opset_version=11, operator_export_type=OperatorExportTypes.ONNX_ATEN_FALLBACK

That produces an error:

  File "./tools/torchscript_converter.py", line 118, in <module>
    onnx_model = export_onnx_model_with_torchscript(cfg, torch_model, first_batch)
  File "/root/some_detectron2/detectron2/export/api.py", line 180, in export_onnx_model_with_torchscript
    return Caffe2Tracer(cfg, model, inputs).export_onnx_with_torchscript()
  File "/root/some_detectron2/detectron2/export/api.py", line 138, in export_onnx_with_torchscript
    return export_onnx_model_impl(traced_model, (inputs,))
  File "/root/some_detectron2/detectron2/export/caffe2_export.py", line 67, in export_onnx_model
    export_params=True,
  File "/root/anaconda2/envs/pytorch-gpu/lib/python3.7/site-packages/torch/onnx/__init__.py", line 172, in export
    custom_opsets, enable_onnx_checker, use_external_data_format)
  File "/root/anaconda2/envs/pytorch-gpu/lib/python3.7/site-packages/torch/onnx/utils.py", line 92, in export
    use_external_data_format=use_external_data_format)
  File "/root/anaconda2/envs/pytorch-gpu/lib/python3.7/site-packages/torch/onnx/utils.py", line 552, in _export
    _check_onnx_proto(proto)
RuntimeError: Attribute 'kernel_shape' is expected to have field 'ints'

==> Context: Bad node spec: input: "441" input: "7" output: "442" op_type: "Conv" attribute { name: "dilations" ints: 1 ints: 1 type: INTS } attribute { name: "group" i: 32 type: INT } attribute { name: "
kernel_shape" type: INTS } attribute { name: "pads" ints: 1 ints: 1 ints: 1 ints: 1 type: INTS } attribute { name: "strides" ints: 1 ints: 1 type: INTS }

And the debug logs there is:

...
  %442 : Tensor = onnx::Conv[dilations=[1, 1], group=32, kernel_shape=annotate(List[int], []), pads=[1, 1, 1, 1], strides=[1, 1]](%441, %7) # /root/anaconda2/envs/pytorch-gpu/lib/python3.7/site-packages/$
orch/nn/modules/conv.py:348:0
  %443 : Tensor = onnx::BatchNormalization[epsilon=1.0000000000000001e-05, momentum=0.90000000000000002](%442, %8, %9, %10, %11) # /root/anaconda2/envs/pytorch-gpu/lib/python3.7/site-packages/torch/nn/fu$
ctional.py:1957:0
  %444 : Tensor = onnx::Relu(%443) # /root/anaconda2/envs/pytorch-gpu/lib/python3.7/site-packages/torch/nn/functional.py:1061:0
...

If it is impossible to tell what’s wrong, could you please guide me how to localize an issue?