MPI Backend with GPU support

I want to train my model using 2 nodes in a HPC system. Each node contains 4 Nvidia V100 GPUs. It requires MPI if more than 1 node is used. However, I don’t have enough expertise on MPI. Besides, I have found a paucity of information about Pytorch MPI Backend with GPU support. During the preparation of my model, I intended to train it to a single machine with 8 GPUs. Unfortunately, I don’t have access to that sort of machine. The HPC as mentioned is the only option for me. I have already gone through Open MPI documents and successfully compiled pytorch 1.5.1(from source) with cuda (10.1) and Open MPI (3.0.4) (CUDA-aware). I would highly appreciate if someone provide me a snippet of code to make specific changes to my source code so that I can train the model in the HPC.
Thank you.

Did you hit any error when using CUDA-aware MPI backend? Based on past discussion, you might need to synchronize CUDA streams in the application code when using CUDA-aware MPI. BTW, is MPI the only option for you, or would Gloo backend work?

MPI is the only option for me.

maybe this will work