Nvidia suggests to use TensorRT framework for a performant inference deployment. For DLRM (DL based Recommendation Systems) inference on GPU, I have the following questions:
Does TensorRT modify the backend (CUDA/C++ source code) of Embedding bag operator or it uses the exact same vanilla PyTorch CUDA kernels?
What are the benefits of using vanilla PyTorch over TensorRT for DLRM inference?
Please let me know your comments. Thanks