Torch.distributed.barrier() hangs in DDP

As @rvarm1 suggested in the Github issue, the problem is solved by using the local model when running the validation, not the DDP one.

So instead of using:

evaluation_metrics = evaluate(model)

I should use:

evaluation_metrics = evaluate(model.module)

Thanks!

5 Likes