EOFError in multiprocessing/connection

Dear forum,

I’m getting the attached error message, which I have a hard time to decrypt. Pytorch will throw this error, but continue most of the time. Should I suppress it or does it tell me that I should construct my data loader in some other way?

Running Pytorch 0.4 on p2.xlarge instance.

Let me know if more information about my code is needed to help.

Exception ignored in: <bound method _DataLoaderIter.__del__ of <torch.utils.data.dataloader._DataLoaderIter object at 0x7fcc2fed5f60>>
Traceback (most recent call last):
  File "/data/miniconda3/envs/vae_cf/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 349, in __del__
    self._shutdown_workers()
  File "/data/miniconda3/envs/vae_cf/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 328, in _shutdown_workers
    self.worker_result_queue.get()
  File "/data/miniconda3/envs/vae_cf/lib/python3.6/multiprocessing/queues.py", line 337, in get
    return _ForkingPickler.loads(res)
  File "/data/miniconda3/envs/vae_cf/lib/python3.6/site-packages/torch/multiprocessing/reductions.py", line 70, in rebuild_storage_fd
    fd = df.detach()
  File "/data/miniconda3/envs/vae_cf/lib/python3.6/multiprocessing/resource_sharer.py", line 57, in detach
    with _resource_sharer.get_connection(self._id) as conn:
  File "/data/miniconda3/envs/vae_cf/lib/python3.6/multiprocessing/resource_sharer.py", line 87, in get_connection
    c = Client(address, authkey=process.current_process().authkey)
  File "/data/miniconda3/envs/vae_cf/lib/python3.6/multiprocessing/connection.py", line 493, in Client
    answer_challenge(c, authkey)
  File "/data/miniconda3/envs/vae_cf/lib/python3.6/multiprocessing/connection.py", line 732, in answer_challenge
    message = connection.recv_bytes(256)         # reject large message
  File "/data/miniconda3/envs/vae_cf/lib/python3.6/multiprocessing/connection.py", line 216, in recv_bytes
    buf = self._recv_bytes(maxlength)
  File "/data/miniconda3/envs/vae_cf/lib/python3.6/multiprocessing/connection.py", line 407, in _recv_bytes
    buf = self._recv(4)
  File "/data/miniconda3/envs/vae_cf/lib/python3.6/multiprocessing/connection.py", line 383, in _recv
    raise EOFError
EOFError:
2 Likes

I see this as well, no idea why. It appears to happen intermittently and I can’t seem to reproduce it consistently

It may be a pytorch version issue;There is this problem in version 1.3.0,but I change the pytorch version to 1.0.1,it‘s ok。

Do you have a reproducible code snippet raising this error so that we could debug it?

I ran the NMT code from FaceBook in https://github.com/facebookresearch/UnsupervisedMT.
It had the EOFError in pytorch version 1.3.0, but it’s ok in version 1.0.1.