Torch.distributed.barrier() hangs in DDP

Manuel_Alejandro_Dia · March 18, 2021, 9:22am

As @rvarm1 suggested in the Github issue, the problem is solved by using the local model when running the validation, not the DDP one.

So instead of using:

evaluation_metrics = evaluate(model)

I should use:

evaluation_metrics = evaluate(model.module)

Thanks!