How to increase the CNN's inference time

Hello, I was recently using CNN(HRNet) for facial landmarks detection, and i will send those landmarks info to a web server. I met a bottleneck of the model’s efficiency. The inference time for each image on GPU(GTX 2080) is about 80ms. After take the preprocessing and io time into account, the time for getting the result of one image is about 170ms.
Should I use multi-thread to accelerate the process? I mean could I use a thread to do inference process(use GPU infrence more images in a batch), and use another thread to send the result landmarks to server.
Or Should I turn to use Tensor RT? I heard it is a more faster approach.