DataLoader randomly crashes after few epochs

I’m training with a DataLoader and it randomly crashes with this error after three epochs:

Traceback (most recent call last):
  File "train.py", line 46, in <module>
    for batch_idx, (song, label) in enumerate(train_loader):
  File "/home/sauhaarda/.local/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 280, in __next__
    idx, batch = self._get_batch()
  File "/home/sauhaarda/.local/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 259, in _get_batch
    return self.data_queue.get()
  File "/usr/lib/python3.5/multiprocessing/queues.py", line 343, in get
    res = self._reader.recv_bytes()
  File "/usr/lib/python3.5/multiprocessing/connection.py", line 216, in recv_bytes
    buf = self._recv_bytes(maxlength)
  File "/usr/lib/python3.5/multiprocessing/connection.py", line 407, in _recv_bytes
    buf = self._recv(4)
  File "/usr/lib/python3.5/multiprocessing/connection.py", line 379, in _recv
    chunk = read(handle, remaining)
  File "/home/sauhaarda/.local/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 178, in
 handler
    _error_if_any_worker_fails()
RuntimeError: DataLoader worker (pid 9777) exited unexpectedly with exit code 1.

My code is available here:

Could you set num_workers=0, run it again and see, if the worker still crashes.
If not, set it to 1 and try it again. It would be interesting to see, if your data is somehow corrupt or not.

Hi, I have the same problem. It works with num_worker=0 but it is so slow.
I am running my code in cluster with 4 GPU, 48 CPU, and 360 GB memory.