Embedding_bag operator on GPU


Nvidia suggests to use TensorRT framework for a performant inference deployment. For DLRM (DL based Recommendation Systems) inference on GPU, I have the following questions:

  • Does TensorRT modify the backend (CUDA/C++ source code) of Embedding bag operator or it uses the exact same vanilla PyTorch CUDA kernels?

  • What are the benefits of using vanilla PyTorch over TensorRT for DLRM inference?

Please let me know your comments. Thanks