Undescriptive error when trying to use torch_tensorrt.Input with torch_tensorrt.compile

Skier · March 24, 2024, 2:00pm

When using this code, I get an error:

    input = torch_tensorrt.Input(
        min_shape=(1, 3, config.model_image_size, config.model_image_size),
        opt_shape=(16, 3, config.model_image_size, config.model_image_size),
        max_shape=(16, 3, config.model_image_size, config.model_image_size),
        dtype=torch.half, name="x")

    trt_gm = torch_tensorrt.compile(model, ir="dynamo", inputs=[input], enabled_precisions = {torch.half, torch.float}, output_format="torchscript")

    torch.jit.save(trt_gm, "trt_model.ts")

  File "/mnt/c/Coding/Testing/PyTorch/MultiClassImageClassification/src/imclaslib/models/multilabel_classifier.py", line 22, in forward
    image_features = self.base_model(x)  # [batch_size, feature_dim]
  File "/root/miniconda3/envs/multilabelimage_model_env/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/miniconda3/envs/multilabelimage_model_env/lib/python3.11/site-packages/timm/models/maxxvit.py", line 1264, in forward
    x = self.forward_features(x)
  File "/root/miniconda3/envs/multilabelimage_model_env/lib/python3.11/site-packages/timm/models/maxxvit.py", line 1255, in forward_features
    x = self.stem(x)
  File "/root/miniconda3/envs/multilabelimage_model_env/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/miniconda3/envs/multilabelimage_model_env/lib/python3.11/site-packages/timm/models/maxxvit.py", line 1098, in forward
    x = self.norm1(x)
  File "/root/miniconda3/envs/multilabelimage_model_env/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/miniconda3/envs/multilabelimage_model_env/lib/python3.11/site-packages/timm/layers/norm_act.py", line 118, in forward
    x = F.batch_norm(

However, I don’t get the same error if I pass in a real example batch of images instead of torch_tensorrt.Input:

images = images.half()
model = model.half()
trt_gm = torch_tensorrt.compile(model, ir="dynamo", inputs=[images], enabled_precisions = {torch.half, torch.float}, output_format="torchscript")

I enabled debug logging with torch_tensorrt and got the root error message though I’m still not sure what to do to get around this:

Traceback (most recent call last):
  File "/mnt/c/Coding/Testing/PyTorch/MultiClassImageClassification/src/compressmodel.py", line 48, in <module>
    trt_gm = torch_tensorrt.compile(model, ir="dynamo", inputs=[input], enabled_precisions = {torch.half, torch.float}, output_format="torchscript")
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/multilabelimage_model_env/lib/python3.11/site-packages/torch_tensorrt/_compile.py", line 228, in compile
    trt_graph_module = dynamo_compile(
                       ^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/multilabelimage_model_env/lib/python3.11/site-packages/torch_tensorrt/dynamo/_compiler.py", line 236, in compile
    trt_gm = compile_module(gm, inputs, settings)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/multilabelimage_model_env/lib/python3.11/site-packages/torch_tensorrt/dynamo/_compiler.py", line 346, in compile_module
    trt_module = convert_module(
                 ^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/multilabelimage_model_env/lib/python3.11/site-packages/torch_tensorrt/dynamo/conversion/_conversion.py", line 56, in convert_module
    interpreter_result = interpreter.run()
                         ^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/multilabelimage_model_env/lib/python3.11/site-packages/torch_tensorrt/dynamo/conversion/_TRTInterpreter.py", line 152, in run
    super().run()
  File "/root/miniconda3/envs/multilabelimage_model_env/lib/python3.11/site-packages/torch/fx/interpreter.py", line 138, in run
    self.env[node] = self.run_node(node)
                     ^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/multilabelimage_model_env/lib/python3.11/site-packages/torch_tensorrt/dynamo/conversion/_TRTInterpreter.py", line 276, in run_node
    trt_node: torch.fx.Node = super().run_node(n)
                              ^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/multilabelimage_model_env/lib/python3.11/site-packages/torch/fx/interpreter.py", line 195, in run_node
    return getattr(self, n.op)(n.target, args, kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/multilabelimage_model_env/lib/python3.11/site-packages/torch_tensorrt/dynamo/conversion/_TRTInterpreter.py", line 362, in call_function
    return converter(self.ctx, target, args, kwargs, self._cur_node_name)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/multilabelimage_model_env/lib/python3.11/site-packages/torch_tensorrt/dynamo/conversion/aten_ops_converters.py", line 103, in aten_ops_batch_norm_legit_no_training
    return impl.normalization.batch_norm(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/multilabelimage_model_env/lib/python3.11/site-packages/torch_tensorrt/dynamo/conversion/impl/normalization/ops.py", line 60, in batch_norm
    if not ctx.net.has_implicit_batch_dimension and len(input.shape) < 4:
                                                    ^^^^^^^^^^^^^^^^
ValueError: __len__() should return >= 0

ptrblck · March 24, 2024, 3:39pm

Are you seeing this error in the latest nightly release?
CC @narendasan

Skier · March 24, 2024, 3:50pm

I just tried:

python -m pip install --pre --upgrade torch torch-tensorrt tensorrt --extra-index-url https://download.pytorch.org/whl/nightly/cu121

to upgrade to the nightly version but it seems I get an error on the upgrade of tensorrt:

Collecting tensorrt
  Using cached tensorrt-9.3.0.post12.dev1.tar.gz (6.9 kB)
  Preparing metadata (setup.py) ... error
  error: subprocess-exited-with-error

  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [6 lines of output]
      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "/tmp/pip-install-dpkz1y4l/tensorrt_4c264976144a4ad8b7c6229173abb5ed/setup.py", line 90, in <module>
          raise RuntimeError("Bad params")
      RuntimeError: Bad params
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

Skier · March 24, 2024, 7:25pm

Also, as someone from nvidia, do you happen to know which of these options would give the best inference speed for using tensorrt?

converting a torch model to tensorrt using torch-tensorrt (AOT compilation)
Converting a pytorch model to onnx and then running the onnx model with the tensorrt backend
Converting the pytorch model directly to tensorrt with GitHub - NVIDIA-AI-IOT/torch2trt: An easy to use PyTorch to TensorRT converter
Using tensorrt as the backend for torch.compile inside pytorch directly (JIT compilation version of 1)

ptrblck · March 24, 2024, 10:26pm

I’m not actively developing TorchTRT so @narendasan can correct me, but I believe option 2 might be the fastest one but option 1 should give you more flexibility as it would be able to fall back to PyTorch operations if needed. Option 3 is deprecated.

Skier · March 25, 2024, 9:28pm

I was trying that option 2 and struggling on that front as well. I made another post with a wholistic set of everything I’ve tried, my model, my model file, logs, errors, etc:

narendasan · April 10, 2024, 10:15pm

The original error seems like its due to the use of dynamic shape with an operator converter that doesn’t fully support dynamic shape. Please file an issue in pytorch/tensorrt for this. In terms of performance, option 2. should be considered the upper bound as you have full control of the resulting engine and how it is run at the cost of additional developer work. We try to get option 1. to be as close to 2. as we can but as some ops may not get converted or there may be PyTorch specific overhead involve, sometimes this is not possible. Option 4. since it is JIT is the most flexible but sometimes you may hit recompilation or other details that impact performance.

soooch · June 26, 2024, 12:54am

Hi @narendasan, would appreciate if you could confirm whether my understanding on performance is correct.

Given the following two pipelines, where the input is a torch module and the output is a serialized engine (to be loaded/run via the TensorRT C++ api):

onnx pipeline:
torch.onnx.export or torch.onnx.dynamo_export → tensorrt.OnnxParser with tensorrt.Builder

pytorch2 native pipeline
torch.export.export or torch_tensorrt.dynamo.trace → torch_tensorrt.dynamo.convert_module_to_trt_engine

the onnx pipeline is currently faster than the pytorch2 native pipeline

soooch · June 26, 2024, 12:59am

I suppose a torch.export.export or torch_tensorrt.dynamo.trace could also be used before the torch.onnx.dynamo_export in the onnx pipeline, though I’m not sure if that would make a difference.

Also, sorry about bumping an old, mostly unrelated thread. Your earlier comment is the only guidance I’ve yet spotted on which of these pipelines to use if performance is the primary concern.

narendasan · August 17, 2024, 12:36am

The onnx pipeline is currently faster than the pytorch2 native pipeline

This could be the case because of the specific graphs that come out of torch.onnx.dynamo_export vs. torch_tensorrt / torch.dynamo + AOTExport

torch_tensorrt targets Core ATen, a minimal spanning set of ops where any PyTorch op can be implemented using a subgraph of Core ATen ops vs ONNX. ONNX has more “composite ops” i.e. ops that could be decomposed further into simpler ops but are left as fused. TRT is mostly designed around ONNX’s level of composition. So for example an upsample operator might be 1 op in ONNX and 10s of ops in Core ATen and TRT won’t necessarily identify and fuse that subgraph down to an upsample node during engine building and mostly relies on the graph translation phase to do that.

So there is a trade off where by using Core ATen it is easier to add support for all of PyTorch for and new PyTorch APIs are automatically supported through decomposition to the core opset but we need to do more work on identifying the high level ops we want to keep around. With PyTorch 2.4 we completed support for Core ATen, so now we are starting to look at this HLO problem more.