DDP training slows down at night

I‘m training a DDP model with 2x2080ti.
I found that Rank0 had always slowed down at night.
The batch time of Rank0 can always increase at about 23:00 and decrease at about 8:00.

How to solve this problem? Thanks

Hey @kaka_zhao, this is very interesting finding. We didn’t have any time-based algorithm in DDP. Is there any other user share the same cluster with you? I wonder if it is possible that some recurrent job is kicked off every day at 23:00, which will compete for resources on network/GPU with your job?

Thanks for your reply. I have found a process about Xorg would be running at night. Now I think that this problem may be attributed to my remote access via VNC.