I guess you are initializing a additional CUDA contexts on the default device (most likely from the other processes). Check if you have any CUDA operations working on the default device instead of the one used in the DDP launch.