Also, IIUC, torch.distributed.run should be fully backward-compatible with torch.distributed.launch. Have you tried simply dropping in torch.distributed.run with the same launch arguments, and if so what sort of issues did you hit there?
Also, IIUC, torch.distributed.run should be fully backward-compatible with torch.distributed.launch. Have you tried simply dropping in torch.distributed.run with the same launch arguments, and if so what sort of issues did you hit there?