This doc encourages to use torchrun
. But doesn’t tell how to install it
How to install and get started with torchrun
?
3 Likes
torchrun
is part of PyTorch v1.10. If you are running an older version, python -m torch.distributed.run
command serves the same purpose.
3 Likes
@cbalioglu shall I run python -m torch.distributed.run
once for the whole cluster? or once per node? or once per GPU? also asked here How to map processes to GPU in DDP and how to launch the DDP cluster?
It should be once per node IIUC