I’m working with a Jetson Nano device, TRT 6 (the latest version that can be used on the Nano), PyTorch 1.2.0 (compatible with TRT6), and Torchvision 0.4.0 (compatible with PyTorch 1.2.0).
I have a Torchvision Mobilenetv2 model I exported to Onnx with the built-in function:
torch.onnx.export(pt_model, dummy_input, out_path, verbose=True)
I then built a TensorRt engine with this Onnx model:
with trt.Builder(TRT_LOGGER) as builder, \ builder.create_network(*EXPLICIT_BATCH) as network, \ trt.OnnxParser(network, TRT_LOGGER) as parser: builder.max_workspace_size = 1 << 28 builder.max_batch_size = 1 builder.fp16_mode = True # ... engine = builder.build_cuda_engine(network)
Then I run inference on this new engine on my Jetson Nano device and can get a latency of about 0.045 seconds (22.2 fps). Running inference on the PyTorch version of this model also has almost the exact same latency of 0.045 seconds.
I also tried to change the mode to INT8 mode when building the TensorRT engine and get the error:
Builder failed while configuring INT8 mode.
Anyone have experience with optimizing Torch models with TensorRT? Am I missing something fundamental when building the TensorRT engine or should I expect a speed-up?