Evaluate during training with distributed

I am using the distributed training package to train on multiple gpus. Training works fine but I would like to be able to evaluate during training either on one gpu or multiple gpus. If I directly call evaluate function during training, each model produces different results. How can I get evaluation results every certain steps while using the distributed package for training?

If you are using DDP, the model replica should be initialized in the same manner. Since DDP performs an all-reduce step on gradients and assumes that they will be modified by the optimizer in all processes in the same way, the model output should be the same.
Are you also observing different outputs during training?

When I evaluate during training, it runs on all gpus and each one produce different results. I am actually running this script:

On line 248, it is mentioned “Only evaluate when single GPU otherwise metrics may not average well”. I don’t understand why and how to change it to be able to evaluate correctly.