All_reduce for TorchElastic

TorchElastic code does NOT compute all_reduce in order to yield the performance throughout all gpus (please check the other ImageNet example.) How does TorchElastic gather values from all gpus? Thank you ahead!!

TorchElastic treats the model mostly as a black box. There’s no magic happening here–in that example there’s no sync of the metrics between gpus. If you did want that behavior you should follow do something like the example in the core pytorch repo