Hello everyone,
I am trying to use pytorch distributed for training and inference but I hate the fact that there is a master node, I would like all the nodes to keep trying to work even if the master node does not exist.
Is there a way to get rib of the master node or to maybe put multiple nodes as master? any hint or help will be apprecited
Thanks in advance