How do you submit job? I met the same problem when using nohup command (affected by terminal shutting down?). Now, I am trying to use screen command.
DDP Error: torch.distributed.elastic.agent.server.api:Received 1 death signal, shutting down workers
1 Like