Why I got 3 other processes on GPU 0 when use DDP to train model?

I guess you are initializing a additional CUDA contexts on the default device (most likely from the other processes). Check if you have any CUDA operations working on the default device instead of the one used in the DDP launch.