Multiple inference models on jetson GPU

Martin_Pedersen · September 22, 2022, 8:49am

Hello

I am trying to run a Yolov5 model and a resnet152 model on my Jetson Xavier NX16 GPU
At first, the Yolov5 model was struggling to utilize the GPU and my inference time was about 4-5 seconds, but my Resnet152 inference time was around 0.2 seconds. After some digging, I tried switching to a TensorRT model for Yolov5 to get better results, this resulted in Yolov5 inference in 0.3 seconds, but Resnet152 inference scaled to 3-4 seconds.
I have tried using

Yolov5model.cuda()
Yolov5results = Yolov5model(input)
Yolov5model.cpu()
Resnet152model.cuda()
Resnet152results = Resnet152model(input)
Resnet152model.cpu()

But it didn’t seem to change anything.
The two models don’t need to run at the same time, so I don’t see why the GPU can’t manage two models loaded in one script
Is there a way to make the two models share resources? Do I need to convert the Resnet152 model to TensorRT as well because of some prioritizing in the Jetson GPU?