To see how much time is spent on interprocess communication, allreduce, etc…; as well as static memory usage and amount of information transferred during interprocess communciation
nvprof from Nvidia is probably the best tool available: CUDA Pro Tip: nvprof is Your Handy Universal GPU Profiler | NVIDIA Developer Blog
DDP currently does not work with the autograd profiler