What's the recommended way of profiling DistributedDataParallel pytorch code?

anneouyang · February 3, 2021, 10:19am

To see how much time is spent on interprocess communication, allreduce, etc…; as well as static memory usage and amount of information transferred during interprocess communciation

osalpekar · February 3, 2021, 6:22pm

nvprof from Nvidia is probably the best tool available: CUDA Pro Tip: nvprof is Your Handy Universal GPU Profiler | NVIDIA Developer Blog

DDP currently does not work with the autograd profiler