Got stuck using dist.barrier

In order to get the precise result on test dataset, I use the following code:

val_stat = self.evaluate() if utils.is_main_process() else None
dist.barrier()
update(val_stat)

And it got stuck.

Hey @ojipadeson, is there any collective communication in self.evaluate()? All processes must launch the same collective communication in the same order. If you are not sure if self.evaluate() launched any collective, you can try remove val_stat = self.evaluate() if utils.is_main_process() else None and see if running dist.barrier() alone passes?