Hello - Is there a way to find out PCIe/network throughput available to a worker for forward pass and backward pass? Please note that I am only interested in throughput per worker and not in a system wide throughput.
Hi, there are no utilities within PyTorch Distributed package that do this out of the box, but here are some suggestions:
- NVIDIA offers NCCL tests: GitHub - NVIDIA/nccl-tests: NCCL Tests to evaluate performance/throughput of collectives which you could use to estimate the throughput available to a particular worker.
- Another option is
lscpi
command: lspci(8): all PCI devices - Linux man page