That’s not necessarily the case as you would need to have free compute resources on your GPU and would also need to use different streams while still providing enough resources for a parallel execution. If a single stream already saturates the compute resources from your GPU, the stream launches will be serialized as they cannot run in parallel.