DistributedDataParallel: RuntimeError: CUDA error: all CUDA-capable devices are busy or unavailable

The exclusive mode might be the right choice for your compute cluster and you can stick to it, if it’s working.
However, I would not recommend it as the default mode, if you are unsure about its limitations (single context creation) and are using your local workstation.

The recommended approach is to use DistributedDataParallel, with a single process per GPU.
Each device would thus create an own context.