Pytorch 2.0.1 failed to run a simple code which works on pytorch 1.13.0

import torch
import torch.distributed as dist
import time

# Code runs on each rank.
dist.init_process_group("nccl")
rank = dist.get_rank()
device = torch.device(rank)

world_size = dist.get_world_size()
datashape = world_size

input_data = torch.ones(datashape).to(device)
input_data *= rank
print(f"RAnk {rank} data: {input_data}")

#broadcast
if rank == 0:
    for i in range(1, world_size):
        dist.send(input_data, i)
        print(f"RAnk {rank} send data to rank {i}")
else:
    dist.recv(input_data, 0)
    print(f"RAnk {rank} recv data from rank 0")
print(f"after broadcast on RAnk {rank} input data: {input_data}")

run the code above using this command:
torchrun --nnodes 1 --nproc_per_node 2 --standalone broadcast.py
On pytorch 1.13.0, it finishes with the correct result. However, on pytorch 2.0.1, it fails with this error:

master_addr is only used for static rdzv_backend and when rdzv_endpoint is not specified.
WARNING:torch.distributed.run:


Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.


RAnk 0 data: tensor([0., 0.], device=‘cuda:0’)
RAnk 1 data: tensor([1., 1.], device=‘cuda:1’)
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -7) local_rank: 0 (pid: 530) of binary: /root/miniconda3/bin/python
Traceback (most recent call last):
File “/root/miniconda3/bin/torchrun”, line 33, in
sys.exit(load_entry_point(‘torch==2.0.1’, ‘console_scripts’, ‘torchrun’)())
File “/root/miniconda3/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py”, line 346, in wrapper
return f(*args, **kwargs)
File “/root/miniconda3/lib/python3.9/site-packages/torch/distributed/run.py”, line 794, in main
run(args)
File “/root/miniconda3/lib/python3.9/site-packages/torch/distributed/run.py”, line 785, in run
elastic_launch(
File “/root/miniconda3/lib/python3.9/site-packages/torch/distributed/launcher/api.py”, line 134, in call
return launch_agent(self._config, self._entrypoint, list(args))
File “/root/miniconda3/lib/python3.9/site-packages/torch/distributed/launcher/api.py”, line 250, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

broadcast.py FAILED