Find PCIe/network throughput per worker

Hello - Is there a way to find out PCIe/network throughput available to a worker for forward pass and backward pass? Please note that I am only interested in throughput per worker and not in a system wide throughput.

Hi, there are no utilities within PyTorch Distributed package that do this out of the box, but here are some suggestions:

  1. NVIDIA offers NCCL tests: GitHub - NVIDIA/nccl-tests: NCCL Tests to evaluate performance/throughput of collectives which you could use to estimate the throughput available to a particular worker.
  2. Another option is lscpi command: lspci(8): all PCI devices - Linux man page