Modify process group during runtime

I have three GPUs labeled with ranks [0, 1, 2]. If rank 2 encounters a deadlock, causing a timeout, I aim to sustain the functionality of the remaining two ranks by initiating a new group. Given that only all_reduce operations are involved, it should function correctly dimension-wise. However, to execute this, I need to utilize init_process_group , which in turn necessitates destroy_process_group . The challenge arises from my limited control over rank 2 as both functions require all processes to call them. Is there a method to adjust the communicator during runtime using torch?