Two issues about using `torch.distributed` with GNU screen

The first issue is that it takes a lot of time between the time when the watchdog process catches collective timeout and when the entire process group is taken down (around 12hrs), as shown below:
(Note that I’m launching distributed training through huggingface accelerate, in a GNU screen session. I launched it and went to sleep, leaving the session remaining attached, the next day when I woke my computer up, I saw the hanged progress bar and witnessed the process of taking the entire progress group down. Does it mean that the taking-down process is stuck due to writing block to screen stdout?)

 22%|██████▉                         | 15200/70301 [1:40:12<6:08:43,  2.49it/s, loss=0.602, lr=5e-5][E ProcessGroupNCCL.cpp:821] [Rank 1] Watchdog caught collective op
eration timeout: WorkNCCL(SeqNum=27532, OpType=ALLREDUCE, Timeout(ms)=30000) ran for 34143 milliseconds before timing out.
 22%|██████▎                      | 15215/70301 [12:06:01<1174:02:40, 76.73s/it, loss=0.51, lr=5e-5][E ProcessGroupNCCL.cpp:456] Some NCCL operations have failed or timed out. Due to the asynchronous nature of CUDA kernels, subsequent GPU operations might run on corrupted/incomplete data.
[E ProcessGroupNCCL.cpp:461] To avoid data inconsistency, we are taking the entire process down.

The second problem is that when I launched my distributed code, and entered the scroll mode (by typing ctrl+a esc), and then scrolled up for a period more than the timeout duration I set, collective operations will timeout. I observed that processes might not be able to write to stdout during scroll mode, is that the reason for timing out?