Running multiple model instances on a single GPU performance analysis

ptrblck · April 3, 2024, 12:36am

Besides the used memory each CUDA kernel will also use compute resources, which are shared between all applications. This post explains it in more details with a reference to a great GTC talk.