Logging in parallel with backward pass

In one of my training loops I’m logging quite a bit of debug information after every parameter update. I’ve noticed that when profiling this loop with cPofile, string formatting takes up a significant chunk of the total training runtime (something like 5%).

That doesn’t seem right to me, since the forward and backward pass take around 2-3 seconds on every iteration. Is there some way I can run the logging in parallel with the backward pass? (and is that likely to solve the problem).

My training loop looks something like this:

for img in dataloader:
  img = img.to('cuda')

  optimizer.zero_grad()

  pred, label = network(img)
  loss = self.loss(pred, label)
  loss.backward()

  optimizer.step()

  fmt = "loss was {:1.3e}"
  msg = fmt.format(loss)
  logger.info(msg)

  # some more complicated logging stuff