I noticed two strange things,
First, when I used Distributed for training, I modified the code. After a while, the application reported an error when RUNNING because of my modification.
Second, I added print to the code, and after a while, the program that was already running will start to print accordingly
@ckx506772099 Sorry for the late reply! Please see DDP is affected by code modification, this is a python multiprocessing module behavior instead of DDP.