I am trying to use the torch.distributed package on a cluster. But, I can’t find relevant docs or an example to start with.
I am aware of the this link. But, I think the API has changed considerably from what is mentioned in this link. For example the function get_rank is now defined torch.distributed.collectives. Another thing is I am unable to locate the file pytorch_exec script to launch my program.
Please point me in the right direction.
Thanks and regards,