I am trying to setup a training workflow with PyTorch DistributedDataParallel (DDP). Generally when I train I pass a logger through to track outputs and record useful information. However, I am having trouble using the logger I have with the DDP method. Right now my code is as follows:
import torch
import torch.multiprocessing as mp
class BaseModel:
def __init__(self, *args, **kwargs):
...
"does things"
def fit(self, *args, **kwargs):
...
'set up stuff'
mp.spawn(self.distributed_training, nprocs=self.num_gpus, args=(self.params, training_input, self.logger))
def distributed_training(params, training_input, logger):
...
for e in epochs:
'trains for an epoch'
logger.info(print_line)
I know I am supposed to use the QueueHandler
and QueueListener
tools from logging
with the import, but I have been scouring the internet and still do not have a clear understanding as to how. Any help would be greatly appreciated.