How to compile Multi-GPU with OpenMPI backend?

The documentation says,
Choose and install your favorite MPI implementation. Note that enabling CUDA-aware MPI might require some additional steps. In our case, we’ll stick to Open-MPI without GPU support: conda install -c conda-forge openmpi

so what are the additional steps one need to do ?

Had a similar problem, wrote what did the trick for me.
Hope it works for you: Segfault using cuda with openmpi