Using tensorboard with DistributedDataParallel

@Jimmy2027: I was able to make logging work by moving SummaryWriter creation from main process to child process, specifically remove

self.logger = SummaryWriter(dir_logs)

And add in run_epochs

exp.logger = SummaryWriter(exp.dir_logs)

So that we don’t have to folk the lock inside SummaryWriter (in _AsyncWriter https://github.com/tensorflow/tensorboard/blob/master/tensorboard/summary/writer/event_file_writer.py#L163). In general each child process should create their own SummaryWriter instead of forking from parent process.

Also unrelated to your issue, tensorboardX has long been deprecated and no longer actively maintained, being replaced by pytorch native support for TensorBoard since Pytorch 1.2. To use it simply replace

from tensorboardX import SummaryWriter

With

from torch.utils.tensorboard import SummaryWriter 
5 Likes