I am running DDP for my training task. I observe that occasionally a python exception is raised:
Traceback (most recent call last): File ".../python3.7/multiprocessing/queues.py", line 242, in _feed send_bytes(obj)
And I observe that this always occurs at the end of a
for loop (e.g., training
for loop or evaluation
for loop). Moreoever, this exception does not cause the process to terminate.
May I know what is the root cause of the issue? Does it affect my training?