Error loading tensorrt compiled model

Hi,
I’m trying to compile the yolor model based on this repo: GitHub - WongKinYiu/yolor: implementation of paper - You Only Learn One Representation: Unified Network for Multiple Tasks (https://arxiv.org/abs/2105.04206) with torch tensorrt.
I first trace the model with:

 traced_model = torch.jit.trace(model, img)

and compile it with:

compile_spec = {
        "inputs": [torch_tensorrt.Input((1, 3, 384, 640), dtype=torch.half)],
        "enabled_precisions": torch.half,
        "truncate_long_and_double": True,
     }

trt_model = torch_tensorrt.compile(traced_model, **compile_spec )

Then I test it, and it seems it works and is ~5x faster. So I save the model using:

torch.jit.save(trt_model, "yolo_trt.ts")

So far so good. But when I try to load the model on the same machine in a different python program:

trt_net = torch.jit.load('yolo_trt.ts')

I get the error:

Traceback (most recent call last):
  File "/yolor/testtrt.py", line 14, in <module>
    trt_net = torch.jit.load('yolo_trt.ts')
  File "/miniconda3/envs/trt/lib/python3.10/site-packages/torch/jit/_serialization.py", line 162, in load
    cpp_module = torch._C.import_ir_module(cu, str(f), map_location, _extra_files)
RuntimeError: [Error thrown at core/runtime/TRTEngine.cpp:132] Expected (binding_name == engine_binded_name) to be true but got false
Could not find a TensorRT engine binding for output named output_1

Which I don’t understand, because the tensorrt compilation seems to work. Is there a mistake in how I save and read the model?

Try to enable debug logging via wrapping the failing code into with torch_tensorrt.logging.debug(): to get more information about the failure.

Thank you for your reply. I got the debug logs, but I don’t see anything that could cause the error.

DEBUG: [Torch-TensorRT] - Deserializing Device Info: 0%7%0%0%Tesla V100S-PCIE-32GB
DEBUG: [Torch-TensorRT] - Deserialized Device Info: Device(ID: 0, Name: Tesla V100S-PCIE-32GB, SM Capability: 7.0, Type: GPU)
DEBUG: [Torch-TensorRT] - Target Device: Device(ID: 0, Name: Tesla V100S-PCIE-32GB, SM Capability: 7.0, Type: GPU)
DEBUG: [Torch-TensorRT] - Setting Device(ID: 0, Name: Tesla V100S-PCIE-32GB, SM Capability: 7.0, Type: GPU) as active device
INFO: [Torch-TensorRT] - Loaded engine size: 72 MiB
DEBUG: [Torch-TensorRT] - Trying to load shared library libcudnn.so.8
DEBUG: [Torch-TensorRT] - Loaded shared library libcudnn.so.8
DEBUG: [Torch-TensorRT] - Using cuDNN as plugin tactic source
DEBUG: [Torch-TensorRT] - Using cuDNN as core library tactic source
INFO: [Torch-TensorRT] - [MemUsageChange] Init cuDNN: CPU +503, GPU +226, now: CPU 2278, GPU 1257 (MiB)
WARNING: [Torch-TensorRT] - TensorRT was linked against cuDNN 8.6.0 but loaded cuDNN 8.5.0
DEBUG: [Torch-TensorRT] - Deserialization required 713725 microseconds.
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +70, now: CPU 0, GPU 70 (MiB)
DEBUG: [Torch-TensorRT] - Trying to load shared library libcudnn.so.8
DEBUG: [Torch-TensorRT] - Loaded shared library libcudnn.so.8
DEBUG: [Torch-TensorRT] - Using cuDNN as plugin tactic source
DEBUG: [Torch-TensorRT] - Using cuDNN as core library tactic source
INFO: [Torch-TensorRT] - [MemUsageChange] Init cuDNN: CPU +0, GPU +10, now: CPU 2279, GPU 1257 (MiB)
WARNING: [Torch-TensorRT] - TensorRT was linked against cuDNN 8.6.0 but loaded cuDNN 8.5.0
DEBUG: [Torch-TensorRT] - Total per-runner device persistent memory is 0
DEBUG: [Torch-TensorRT] - Total per-runner host persistent memory is 508992
DEBUG: [Torch-TensorRT] - Allocated activation device memory of size 22800384
INFO: [Torch-TensorRT] - [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +22, now: CPU 0, GPU 92 (MiB)
WARNING: [Torch-TensorRT] - CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage. See `CUDA_MODULE_LOADING` in https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#env-vars
DEBUG: [Torch-TensorRT] - Input binding name: input_0pyt arg idx: 0)
DEBUG: [Torch-TensorRT] - Output binding name: output_0pyt return idx: 1)
Traceback (most recent call last):
  File "/yolor/testtrt.py", line 14, in <module>
    trt_net = torch.jit.load('yolo_trt.ts')
  File "/miniconda3/envs/trt/lib/python3.10/site-packages/torch/jit/_serialization.py", line 162, in load
    cpp_module = torch._C.import_ir_module(cu, str(f), map_location, _extra_files)
RuntimeError: [Error thrown at core/runtime/TRTEngine.cpp:132] Expected (binding_name == engine_binded_name) to be true but got false
Could not find a TensorRT engine binding for output named output_1

Do you see anything in the logs that gives more information about the error?

No, I don’t see any obvious issues in the logs but maybe @narendasan would see something or has more debugging ideas.

Is this with Torch-TensorRT 1.3.0?

Yes, torch-tensorrt 1.3.0 and Tensorrt version 8.4.3

I would say if possible use the nightly version 1.4.0.dev0. I believe there were a couple fixes to the runtime to address similar issues

Same issue for me with torch_tensorrt 1.3.0, do you solved this ?
because for me 1.4.0 cause other issues 🐛 [Bug] Regression : Torch-TensorRT now fail to convert due to unsupported negative pad for torch.nn.ConstantPad2d · Issue #2079 · pytorch/TensorRT · GitHub

So at this time the only way that work fine for me is with torch_tensorrt 1.2.0 but require python < 3.9 and this is a big problem in future