Error training multi-GPU with DDP

I used mul-ti GPUs for training a model, and it can detect my 3 GPUs. However, after load the pre-trained model, It got some issues with DDP as the image below, and then my jupyter notebook was automatically disconnected.
Seems like the issues came from pytorch. I have upgrade pytorch o the lastest version but it’s still not work.
Hope you guys can help me address this issue. Thank you.
Note: I’m still able to train with single GPU.


The error you’re reporting is just for the supervisor process detecting an issue on one of the child processes.

You need to look at the individual output of each rank to understand what happened.