PyTorch model to Onnx to TensorRT engine = no speed up for inference?

Hey everyone,

I’m working with a Jetson Nano device, TRT 6 (the latest version that can be used on the Nano), PyTorch 1.2.0 (compatible with TRT6), and Torchvision 0.4.0 (compatible with PyTorch 1.2.0).

I have a Torchvision Mobilenetv2 model I exported to Onnx with the built-in function:

    torch.onnx.export(pt_model, dummy_input, out_path, verbose=True)

I then built a TensorRt engine with this Onnx model:

with trt.Builder(TRT_LOGGER) as builder, \
      builder.create_network(*EXPLICIT_BATCH) as network, \
      trt.OnnxParser(network, TRT_LOGGER) as parser:

      builder.max_workspace_size = 1 << 28
      builder.max_batch_size = 1
      builder.fp16_mode = True
      # ... 
      engine = builder.build_cuda_engine(network)

Then I run inference on this new engine on my Jetson Nano device and can get a latency of about 0.045 seconds (22.2 fps). Running inference on the PyTorch version of this model also has almost the exact same latency of 0.045 seconds.

I also tried to change the mode to INT8 mode when building the TensorRT engine and get the error: Builder failed while configuring INT8 mode.

Anyone have experience with optimizing Torch models with TensorRT? Am I missing something fundamental when building the TensorRT engine or should I expect a speed-up?


1 Like

I haven’t tried to export Mobilenet yet, but could see a speedup for FasterRCNN.
How are you currently measuring the the throughput?

Thanks for this thread. I am also in the same boat, trying to figure out optimizing networks for jetson. @jdev Have you tried the int8 quantization pipeline, first calibrating images to int8 and then doing inference on it? I believe you will run into difficulties for operations that are not supported.

Nvidia’s retinanet-examples repo gives a working pipeline but it’s obviously much more useful to have this going for our own custom networks.