I had the same issue trying to parallel process multiple models for model combination.
I fixed the problem for me by NEVER calling destroy_process_group(). Instead, once I call init_process_group(…), I only check if the process_group has been initialised with torch.distributed.is_initialized().
I observe that if you ever destroy a process group, I cannot initalise anymore, because I run into this timeout.