I am trying to setup a training workflow with PyTorch DistributedDataParallel (DDP). Generally when I train I pass a logger through to track outputs and record useful information. However, I am having trouble using the logger I have with the DDP method. Right now my code is as follows:
import torch import torch.multiprocessing as mp class BaseModel: def __init__(self, *args, **kwargs): ... "does things" def fit(self, *args, **kwargs): ... 'set up stuff' mp.spawn(self.distributed_training, nprocs=self.num_gpus, args=(self.params, training_input, self.logger)) def distributed_training(params, training_input, logger): ... for e in epochs: 'trains for an epoch' logger.info(print_line)
I know I am supposed to use the
QueueListener tools from
logging with the import, but I have been scouring the internet and still do not have a clear understanding as to how. Any help would be greatly appreciated.