High throughput ResNet50 model for distributed training with Horovod

Looking for a high throughput (images/sec) ResNet50 PyTorch model for distributed training with Horovod on GPUs . The Horovod example code ( using torchvision ResNet50 , horovod/pytorch_imagenet_resnet50.py at master · horovod/horovod · GitHub ) is there, but looks like it does not run some functions ( e.g. cross_entropy , may be optimizer also ) in GPUs.

I’m not sure how Horovod works but can you modify the example with .to(device) so that the functions that you want to use GPU are using GPU?

You may find more info in the Horovod forums or stack overflow as many of the members here are proficient in PyTorch Distributed Training rather than Horovod.

Thanks. Is there a way to globally move all functions that are gpu compatible to gpu ? The piecemeal approach could be error prone. Looks like you are saying there is no PyTorch ResNet50+Horovod “official” example code ? Would that be the right conclusion to draw ?