I wonder if there is a particular reason why the MPI initialize of pytorch.distributed only support MPI_THREAD_SERIALIZED rather than MPI_THREAD_MULTIPLE?
I tried to modify it to MPI_THREAD_MULTIPLE and can successfully build pytorch from source. Are there particular cases where MPI_THREAD_MULTIPLE fail for pytorch?