Dataloader stucks whenever start training

Previously my training was working perfectly fine and trained the model till 27 epochs, but now when I resumed training from 28th epoch training freezes because dataloader stucks. I tried with num_worker=4 and also with number_workers=0. Initially number_workers=4 was working fine. I also tried rebooting my PC but problem remains. I manually stopped the training when it freezes and here is the traceback when I stopped training.

CTraceback (most recent call last):
  File "", line 225, in <module>
  File "", line 80, in train
    for batch_id,X in enumerate(train_loader):
  File "/home/pickledev/anaconda3/envs/torch_gpu/lib/python3.7/site-packages/torch/utils/data/", line 345, in __next__
    data = self._next_data()
  File "/home/pickledev/anaconda3/envs/torch_gpu/lib/python3.7/site-packages/torch/utils/data/", line 841, in _next_data
    idx, data = self._get_data()
  File "/home/pickledev/anaconda3/envs/torch_gpu/lib/python3.7/site-packages/torch/utils/data/", line 808, in _get_data
    success, data = self._try_get_data()
  File "/home/pickledev/anaconda3/envs/torch_gpu/lib/python3.7/site-packages/torch/utils/data/", line 761, in _try_get_data
    data = self._data_queue.get(timeout=timeout)
  File "/home/pickledev/anaconda3/envs/torch_gpu/lib/python3.7/multiprocessing/", line 104, in get
    if not self._poll(timeout):
  File "/home/pickledev/anaconda3/envs/torch_gpu/lib/python3.7/multiprocessing/", line 257, in poll
    return self._poll(timeout)
  File "/home/pickledev/anaconda3/envs/torch_gpu/lib/python3.7/multiprocessing/", line 414, in _poll
    r = wait([self], timeout)
  File "/home/pickledev/anaconda3/envs/torch_gpu/lib/python3.7/multiprocessing/", line 920, in wait
    ready =
  File "/home/pickledev/anaconda3/envs/torch_gpu/lib/python3.7/", line 415, in select
    fd_event_list = self._selector.poll(timeout)

Is the problem only raised, if you are trying to continue the training in epoch 28 or also if you just restart the complete training?

Whenever I try to restart the training same issue remains

It sounds like a system issue, if nothing works suddenly.
Did you change any drivers or are you running out of space/memory?
Could you restart the machine and if that doesn’t help use a docker container as a quick check?

Issue is resolved. IT was neither related to dataloader nor multiprocessing. There is a file whenever dataloader tried to load that it hanged so removed that file from training.
Thanks for your help