What are the recommended ways to deploy a pytorch model to a desktop machine (with 1080ti GPUs) for fast inference?
Related to this, does NVIDIA TensorRT speed up PyTorch model inference on 1080ti GPUs? If so, are then any benchmarks showing by how much for typical deep learning models?
Would caffe2 be another option?
We typically see a 2x speedup using TensorRT, with an additional 2x if you go to unit8. It is pretty amazing actually.
Same for me, I have seen a large speedup using tensorRT. Don’t remember the exact number though. But it was substantial.
Guys, could you point me on any tutorial or guide how to inference a pytorch model by TensorRT?