I’ve been using DDP for all my distributed training and now would like to use tensorboard for my visualization/logging. The only solution I can think of is to use “gather” in the rank 0 process each time I want to log an item to the board, since each process/GPU only has a subset of the data and statistics. Apparently this is somewhat cumbersome and I’m not sure if this hurts distributed efficiency. I wonder:
-
if doing so indeed hurts distributed efficiency
-
if there are recommended practices when it comes to use tensorboard in the DDP setting? If so, what are they?
Thanks!