Pytorch 1.8 distributed mode will disable python logging module

Text code is like this:

import sys
import argparse
import logging
import torch.distributed as dist

def parse_args():
    parse = argparse.ArgumentParser()
    parse.add_argument('--local_rank', dest='local_rank', type=int, default=-1,)
    return parse.parse_args()

args = parse_args()
    #  init_method='tcp://{}'.format(cfg.port),

def set_logger():
    logfile = 'log.txt'
    log_level = logging.INFO
    FORMAT = '%(levelname)s %(filename)s(%(lineno)d): %(message)s'
    logging.basicConfig(level=log_level, format=FORMAT, filename=logfile)

logger = logging.getLogger()'acb')

If I launch the code like this:

CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per_node=2

I will see no log message. This is not same as pytorch 1.7.

Might be related to this issue with a potential fix.

Hi, Is there any temporary method to bypass this problem in the user side until the upstream is fixed ?

You could either cherry-pick the mentioned PR and build from source or, I think, in your code base you could use logging.basicConfig(..., force=True).

adding force does not work for me, I use python3.7, should I update to python 3.8 ?

Yes, I think so. From the docs:

Changed in version 3.8: The force argument was added.

Hi @ptrblck,

This issue still exists though it seems fixes were pushed in the issues you cited.
I am using Pytorch 1.9.0 CUDA 10.2 build and Python 3.7 on Linux.

Do you know how and where to initialize the logger so that it works correctly?


EDIT: Seems like I found an implementation which works. I have been scouring for a solution where logging works with torch.distributed and couldn’t seem to make it work. I also saw some other users facing issues with the same thing where the logger wouldn’t print anything to console/file.

I looked at the logger implementation here

and this works. I am not sure what I was doing wrong but I will investigate further if I have time. It seems like there are subtle differences in which process and where (before or after dist.init_process_group) you initialize your logger.