Hi folks,
I was trying to re-run the CMM/DM fine-tuning example by following the instructions from the repo – https://github.com/pytorch/fairseq/blob/master/examples/bart/README.summarization.md#4-fine-tuning-on-cnn-dm-summarization-task
When I run the command, (Point #4 in the link), it starts the training loop, prints the progress bar, but then fails with an OS Error.
epoch 001: 0%| | 0/29399 [00:00<?, ?it/s]2020-08-21 05:36:29 | INFO | fairseq.trainer | begin training epoch 1
Traceback (most recent call last):
File "/projects/anaconda3/envs/py36-fairseq/lib/python3.6/multiprocessing/queues.py", line 234, in _feed
obj = _ForkingPickler.dumps(obj)
File "/projects/anaconda3/envs/py36-fairseq/lib/python3.6/multiprocessing/reduction.py", line 51, in dumps
cls(buf, protocol).dump(obj)
File "/projects/anaconda3/envs/py36-fairseq/lib/python3.6/site-packages/torch/multiprocessing/reductions.py", line 322, in reduce_storage
df = multiprocessing.reduction.DupFd(fd)
File "/projects/anaconda3/envs/py36-fairseq/lib/python3.6/multiprocessing/reduction.py", line 191, in DupFd
return resource_sharer.DupFd(fd)
File "/projects/anaconda3/envs/py36-fairseq/lib/python3.6/multiprocessing/resource_sharer.py", line 53, in __init__
self._id = _resource_sharer.register(send, close)
File "/projects/anaconda3/envs/py36-fairseq/lib/python3.6/multiprocessing/resource_sharer.py", line 77, in register
self._start()
File "/projects/anaconda3/envs/py36-fairseq/lib/python3.6/multiprocessing/resource_sharer.py", line 130, in _start
self._listener = Listener(authkey=process.current_process().authkey)
File "/projects/anaconda3/envs/py36-fairseq/lib/python3.6/multiprocessing/connection.py", line 438, in __init__
self._listener = SocketListener(address, family, backlog)
File "/projects/anaconda3/envs/py36-fairseq/lib/python3.6/multiprocessing/connection.py", line 576, in __init__
self._socket.bind(address)
OSError: AF_UNIX path too long
Couldn’t find anything related to this online (except that it’s a Python error raised when the address path exceeds the UNIX limit).
Was wondering if anyone has encountered something similar.
Thanks.