Use joblib to train an ensemble of small models on the same GPU in parallel

That’s not necessarily the case as you would need to have free compute resources on your GPU and would also need to use different streams while still providing enough resources for a parallel execution. If a single stream already saturates the compute resources from your GPU, the stream launches will be serialized as they cannot run in parallel.