Hi,
I’m trying to launch a train.py DDP script to run over a 4-GPU machine.
i’m using the launch.py tool described here, (this experience is quite ugly btw, I which there was a clean PyTorch class to do that!) that is supposed to set local_rank properly in each process: “–local_rank: This is passed in via launch.py” as the documentation says.
python /home/ec2-user/anaconda3/envs/pytorch_latest_p37/lib/python3.7/site-packages/torch/distributed/launch.py \
--nnode=1 \
--node_rank=0 \
--nproc_per_node=4 \
train.py \
--gpu-count 4 \
--dataset . \
--cache tmp \
--height 604 \
--width 960 \
--checkpoint-dir . \
--batch 10 \
--workers 24 \
--log-freq 20 \
--prefetch 2 \
--bucket $bucket \
--eval-size 10 \
--iterations 20 \
--class-list a2d2_images/camera_lidar_semantic/class_list.json
However, in each of my processes local_rank = -1 (default value). What is wrong? how to get local_ranks each distinct?