TorchVision is exporting ONNX with weights as tensor

HI All,

I’m quite new on PyTorch and I have already a interesting challenge ahead. I’m trying to run MaskRCNN (torchvision implementation) on NVIDIA TensorRT SDK. I’ve already reported an issue with them and the initial feedback is that TensorRT doesn’t accept weights exported as tensors.

Is there a way to export the model (https://pytorch.org/tutorials/intermediate/torchvision_tutorial.html) with python types instead?

Is there any tool I can post run in order to convert the weights?

I really appreciate any feedback.

Thanks a lot

To export scripted models you could have a look at TRTorch.

I’m unsure what this should mean:

Hi @ptrblck,

The trtexec is failing even for simple models. This is something about the weights. The NVIDIA support answered (...) Looks like the issue is with weights, and TRT currently does not support convolutions where the weights are tensors. (...) and referred to https://forums.developer.nvidia.com/t/unable-to-convert-onnx-model-to-tensorrt/142431/2 issue.

I tried with a simpler model and the same issue happens

import onnx
import argparse
import torch
import torch.nn as nn

class MinimalModel(nn.Module):
    def __init__(self):
        super(MinimalModel, self).__init__()
        self.constant_zero_pad = nn.ConstantPad2d((1, 0, 0, 0), 0)
    def forward(self, input_tensor):
        return self.constant_zero_pad(input_tensor)

minimal_model = MinimalModel()
minimal_model = nn.DataParallel(minimal_model)
minimal_model.cuda()
# Random deep feature
input_tensor = torch.rand((1, 32, 128, 128))
# Check model can do a forward pass
minimal_model(input_tensor)
# Export to onnx
torch.onnx.export(
    minimal_model.module,
    (input_tensor),
    'model.onnx',
    export_params=True, verbose=True, training=False, opset_version=11
)

And then I ran the trtexec command

/usr/src/tensorrt/bin/trtexec --onnx=model.onnx --saveEngine=engine.trt --explicitBatch

&&&& RUNNING TensorRT.trtexec # /usr/src/tensorrt/bin/trtexec --onnx=model.onnx --saveEngine=engine.trt --explicitBatch
[08/04/2020-10:15:12] [I] === Model Options ===
[08/04/2020-10:15:12] [I] Format: ONNX
[08/04/2020-10:15:12] [I] Model: model.onnx
[08/04/2020-10:15:12] [I] Output:
[08/04/2020-10:15:12] [I] === Build Options ===
[08/04/2020-10:15:12] [I] Max batch: explicit
[08/04/2020-10:15:12] [I] Workspace: 16 MB
[08/04/2020-10:15:12] [I] minTiming: 1
[08/04/2020-10:15:12] [I] avgTiming: 8
[08/04/2020-10:15:12] [I] Precision: FP32
[08/04/2020-10:15:12] [I] Calibration: 
[08/04/2020-10:15:12] [I] Safe mode: Disabled
[08/04/2020-10:15:12] [I] Save engine: engine.trt
[08/04/2020-10:15:12] [I] Load engine: 
[08/04/2020-10:15:12] [I] Builder Cache: Enabled
[08/04/2020-10:15:12] [I] NVTX verbosity: 0
[08/04/2020-10:15:12] [I] Inputs format: fp32:CHW
[08/04/2020-10:15:12] [I] Outputs format: fp32:CHW
[08/04/2020-10:15:12] [I] Input build shapes: model
[08/04/2020-10:15:12] [I] Input calibration shapes: model
[08/04/2020-10:15:12] [I] === System Options ===
[08/04/2020-10:15:12] [I] Device: 0
[08/04/2020-10:15:12] [I] DLACore: 
[08/04/2020-10:15:12] [I] Plugins:
[08/04/2020-10:15:12] [I] === Inference Options ===
[08/04/2020-10:15:12] [I] Batch: Explicit
[08/04/2020-10:15:12] [I] Input inference shapes: model
[08/04/2020-10:15:12] [I] Iterations: 10
[08/04/2020-10:15:12] [I] Duration: 3s (+ 200ms warm up)
[08/04/2020-10:15:12] [I] Sleep time: 0ms
[08/04/2020-10:15:12] [I] Streams: 1
[08/04/2020-10:15:12] [I] ExposeDMA: Disabled
[08/04/2020-10:15:12] [I] Spin-wait: Disabled
[08/04/2020-10:15:12] [I] Multithreading: Disabled
[08/04/2020-10:15:12] [I] CUDA Graph: Disabled
[08/04/2020-10:15:12] [I] Skip inference: Disabled
[08/04/2020-10:15:12] [I] Inputs:
[08/04/2020-10:15:12] [I] === Reporting Options ===
[08/04/2020-10:15:12] [I] Verbose: Disabled
[08/04/2020-10:15:12] [I] Averages: 10 inferences
[08/04/2020-10:15:12] [I] Percentile: 99
[08/04/2020-10:15:12] [I] Dump output: Disabled
[08/04/2020-10:15:12] [I] Profile: Disabled
[08/04/2020-10:15:12] [I] Export timing to JSON file: 
[08/04/2020-10:15:12] [I] Export output to JSON file: 
[08/04/2020-10:15:12] [I] Export profile to JSON file: 
[08/04/2020-10:15:12] [I] 
----------------------------------------------------------------
Input filename:   model.onnx
ONNX IR version:  0.0.6
Opset version:    11
Producer name:    pytorch
Producer version: 1.6
Domain:           
Model version:    0
Doc string:       
----------------------------------------------------------------
[08/04/2020-10:15:13] [W] [TRT] onnx2trt_utils.cpp:220: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[08/04/2020-10:15:13] [W] [TRT] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
ERROR: builtin_op_importers.cpp:2179 In function importPad:
[8] Assertion failed: inputs.at(1).is_weights()
[08/04/2020-10:15:13] [E] Failed to parse onnx file
[08/04/2020-10:15:13] [E] Parsing model failed
[08/04/2020-10:15:13] [E] Engine creation failed
[08/04/2020-10:15:13] [E] Engine set up failed
&&&& FAILED TensorRT.trtexec # /usr/src/tensorrt/bin/trtexec --onnx=model.onnx --saveEngine=engine.trt --explicitBatch

So , my question is: is there another way to save the weights? or some converter in order to satisfy the TensorRT requirements?

Thank you a lot.

Unfortunately, I’m not familiar enough with onnx-tensorrt, but it seems the linked thread mentions a potential workaround.