Torch distributed data-parallel vs Apex distributed data-parallel

c_cj · November 23, 2021, 1:48pm

Thanks for your reply. I have solved this problem. It is caused that I have run a partial dataloader only in local_rank=0 for a temporary evaluation. It seems that all dataloaders in different processes must be in the same state.