I am relatively new to TensorRT so I would like to clarify its intrinsic behaviour when combined with Torch. I understand that execute_async_v3
launches the kernel on gpu and gives the control back to python interpreter immediately, hence host and device can work asynchronously. Immediately after executing the kernel I want to perform a pytorch operation e.g. torch.nn.functional.interpolate
on the cuda tensors. Do I need to call torch.cuda.synchronise()
to ensure that the result of the kernel execution is ready? Or does calling such a function implicitely mean that it will be executed after the first one if it is in the same stream? I ran a small experiment with and without torch.cuda.synchronise()
and both results are correct. However, the one without torch.cuda.synchronise()
takes only 2ms as compared to 9ms in the other case.
In my example I am using torch2trt/torch2trt/trt_module.py at 4e820ae31b4e35d59685935223b05b2e11d47b03 · NVIDIA-AI-IOT/torch2trt · GitHub. Thanks to this excellent repo tensorrt engine outputs the results directly to a torch tensor on cuda. An example of what I would like to do is:
# execute
outputs = [None] * len(self.output_names)
for i, output_name in enumerate(self.output_names):
dtype = torch_dtype_from_trt(self.engine.get_tensor_dtype(output_name))
shape = tuple(self.context.get_tensor_shape(output_name))
device = torch_device_from_trt(self.engine.get_tensor_location(output_name))
output = torch.empty(size=shape, dtype=dtype, device=device)
outputs[i] = output
self.context.set_tensor_address(output_name, output.data_ptr())
self.context.execute_async_v3(torch.cuda.current_stream().cuda_stream)
# here
torch.nn.functional.interpolate(outputs[0], **kwargs)
if self.output_flattener is not None:
outputs = self.output_flattener.unflatten(outputs)
else:
outputs = tuple(outputs)
if len(outputs) == 1:
outputs = outputs[0]
return outputs
Thank you in advance!
Environment
TensorRT Version: 10.4.0 (docker: dustynv/l4t-pytorch:r36.4.0)
Pytorch Version: 2.4.0
GPU Type: Tegra
System: Jetson Orin AGX