Parallelising expert execution on single GPU using CUDA streams

This post might be helpful explaining the memory and compute resources in the attached GTC talk.

1 Like