Dataloaders does not "start"

Please consider this simple fragment of code:

print("resetting metrics")
for m in self.metrics:

 for bi, (images, labels, text) in enumerate(dl):
       if bi == 0:
           print("IMAGES:", images.shape)
           print("LABELS:", labels.shape)
           print("TEXT:", text.shape)

Normally, my code runs fine, but, every once in a while, it gets stuck. Indeed, the only output is:

* EPOCH 1/1000, START
calling concrete _epoch() method
resetting metrics

and nothing else. If I interrupt the process with CTRL+C, I read this:

^CError in atexit._run_exitfuncs:
Traceback (most recent call last):
  File "/opt/conda/envs/torch/lib/python3.8/multiprocessing/", line 27, in poll
    pid, sts = os.waitpid(, flag)

that seems to indicate it was stuck in waiting for some other process to finish.

I know the fragment is very small, but is there anyone who knows what I could do to try and solve this problem? I cannot reproduce it, meaning that it normally runs fine, but sometimes it does not.

I do not use explicitly any multiprocessing functions, just “plain” pytorch code. However, the DataLoader is configured to use multiple workers.