Running multiple model instances on a single GPU performance analysis

Besides the used memory each CUDA kernel will also use compute resources, which are shared between all applications. This post explains it in more details with a reference to a great GTC talk.