ConnectionRefusedError: [Errno 111] Connection refused

When I train my model, an error comes, and I have no idea how to fix it:

Traceback (most recent call last):
  File "/home/zyd/PycharmProjects/ship_detect/train.py", line 166, in <module>
    train()
  File "/home/zyd/PycharmProjects/ship_detect/train.py", line 88, in train
    for idx, (images, masks) in enumerate(data_loader):
  File "/home/zyd/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 272, in __next__
    return self._process_next_batch(batch)
  File "/home/zyd/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 307, in _process_next_batch
    raise batch.exc_type(batch.exc_msg)
RuntimeError: Traceback (most recent call last):
  File "/home/zyd/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 57, in _worker_loop
    samples = collate_fn([dataset[i] for i in batch_indices])
  File "/home/zyd/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 138, in default_collate
    return [default_collate(samples) for samples in transposed]
  File "/home/zyd/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 138, in <listcomp>
    return [default_collate(samples) for samples in transposed]
  File "/home/zyd/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 115, in default_collate
    return torch.stack(batch, 0, out=out)
RuntimeError: invalid argument 0: Tensors must have same number of dimensions: got 3 and 4 at /opt/conda/conda-bld/pytorch_1524586445097/work/aten/src/TH/generic/THTensorMath.c:3577

Exception ignored in: <bound method _DataLoaderIter.__del__ of <torch.utils.data.dataloader._DataLoaderIter object at 0x7fd368331f98>>
Traceback (most recent call last):
  File "/home/zyd/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 349, in __del__
    self._shutdown_workers()
  File "/home/zyd/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 328, in _shutdown_workers
    self.worker_result_queue.get()
  File "/home/zyd/anaconda3/lib/python3.6/multiprocessing/queues.py", line 344, in get
    return _ForkingPickler.loads(res)
  File "/home/zyd/anaconda3/lib/python3.6/site-packages/torch/multiprocessing/reductions.py", line 70, in rebuild_storage_fd
    fd = df.detach()
  File "/home/zyd/anaconda3/lib/python3.6/multiprocessing/resource_sharer.py", line 57, in detach
    with _resource_sharer.get_connection(self._id) as conn:
  File "/home/zyd/anaconda3/lib/python3.6/multiprocessing/resource_sharer.py", line 87, in get_connection
    c = Client(address, authkey=process.current_process().authkey)
  File "/home/zyd/anaconda3/lib/python3.6/multiprocessing/connection.py", line 487, in Client
    c = SocketClient(address)
  File "/home/zyd/anaconda3/lib/python3.6/multiprocessing/connection.py", line 614, in SocketClient
    s.connect(address)
ConnectionRefusedError: [Errno 111] Connection refused

Have you solved this problem? How is it solved?

The actual error is:

RuntimeError: invalid argument 0: Tensors must have same number of dimensions: got 3 and 4 at /opt/conda/conda-bld/pytorch_1524586445097/work/aten/src/TH/generic/THTensorMath.c:3577

It seems your dataset is returning data of different dimensions and the dataloader couldn’t handle it.

2 Likes

I have the same issue. I think this error is just a consequence of the connexion refused error encountered due to the multiprocessing.

I got this ConnectionRefusedError too, but I didn’t get any other error. It seems that maybe when you’d like to get a torch.tensor from a multiprocessing.Queue, this error would be raised, and I don’t know why.