DDP with gloo is stuck at initialization

I am trying to use gloo for distributed training. but the script is stuck at initializing the process group.

dist.init_process_group(backend=“gloo”, init_method=“tcp://”,rank=1, world_size=3)

I am running the scripts on a 3 node cpu cluster on Azure Databricks

please help.

Can you include more information? Can you include your script and also the logs? You can run with TORCH_CPP_LOG_LEVEL=INFO to also output more detailed logs