Anyway to get number of procedures per node in distributed training?
In horovod we could use hvd.local_size()
, but I found no alternative in distributed module.
Thanks.
There is no concept of local processes and remote processes; only the total number of processes through torch.distributed.get_world_size()
.
To understand if we need add this: what do you need this for?
For example, I want to fully use all CPU for data loading without overhead.
Thus I want to divide the number_workers
by local size.
@pietern Any solution for this? Thanks.