Pytorch 1.8 distributed mode will disable python logging module

Text code is like this:

import sys
import argparse
import logging
import torch.distributed as dist

def parse_args():
    parse = argparse.ArgumentParser()
    parse.add_argument('--local_rank', dest='local_rank', type=int, default=-1,)
    return parse.parse_args()

args = parse_args()
torch.cuda.set_device(args.local_rank)
dist.init_process_group(
    backend='nccl',
    #  init_method='tcp://127.0.0.1:{}'.format(cfg.port),
    init_method='env://',
    world_size=torch.cuda.device_count(),
    rank=args.local_rank
)


def set_logger():
    logfile = 'log.txt'
    log_level = logging.INFO
    FORMAT = '%(levelname)s %(filename)s(%(lineno)d): %(message)s'
    logging.basicConfig(level=log_level, format=FORMAT, filename=logfile)
    logging.root.addHandler(logging.StreamHandler(sys.stdout))

set_logger()
logger = logging.getLogger()
logger.info('acb')

If I launch the code like this:

CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per_node=2 test.py

I will see no log message. This is not same as pytorch 1.7.

Might be related to this issue with a potential fix.

Hi, Is there any temporary method to bypass this problem in the user side until the upstream is fixed ?

You could either cherry-pick the mentioned PR and build from source or, I think, in your code base you could use logging.basicConfig(..., force=True).

adding force does not work for me, I use python3.7, should I update to python 3.8 ?

Yes, I think so. From the docs:

Changed in version 3.8: The force argument was added.

Hi @ptrblck,

This issue still exists though it seems fixes were pushed in the issues you cited.
I am using Pytorch 1.9.0 CUDA 10.2 build and Python 3.7 on Linux.

Do you know how and where to initialize the logger so that it works correctly?

Thanks.

EDIT: Seems like I found an implementation which works. I have been scouring for a solution where logging works with torch.distributed and couldn’t seem to make it work. I also saw some other users facing issues with the same thing where the logger wouldn’t print anything to console/file.

I looked at the logger implementation here

and this works. I am not sure what I was doing wrong but I will investigate further if I have time. It seems like there are subtle differences in which process and where (before or after dist.init_process_group) you initialize your logger.