Description of the problem
The error will occur if the num_workers > 0 , But when I set num_workers = 0 , the error disappeared, though, this will slow down the trainning speed. I think the multiprocessing really matters here .How can I solve this problem?
env
docker python3.8 Pytorch 1.11.0+cu113
error output
Traceback (most recent call last):
File "/opt/conda/lib/python3.8/multiprocessing/resource_sharer.py", line 149, in _serve
send(conn, destination_pid)
File "/opt/conda/lib/python3.8/multiprocessing/resource_sharer.py", line 50, in send
reduction.send_handle(conn, new_fd, pid)
File "/opt/conda/lib/python3.8/multiprocessing/reduction.py", line 184, in send_handle
sendfds(s, [handle])
Traceback (most recent call last):
File "/opt/conda/lib/python3.8/multiprocessing/reduction.py", line 149, in sendfds
File "save_disp.py", line 85, in <module>
sock.sendmsg([msg], [(socket.SOL_SOCKET, socket.SCM_RIGHTS, fds)])
OSError: [Errno 9] Bad file descriptor
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/opt/conda/lib/python3.8/multiprocessing/resource_sharer.py", line 151, in _serve
test()
File "save_disp.py", line 55, in test
close()
for batch_idx, sample in enumerate(TestImgLoader):
File "/opt/conda/lib/python3.8/multiprocessing/resource_sharer.py", line 52, in close
File "/opt/conda/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 530, in __next__
os.close(new_fd)
OSError: [Errno 9] Bad file descriptor
data = self._next_data()
File "/opt/conda/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1207, in _next_data
idx, data = self._get_data()
File "/opt/conda/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1173, in _get_data
success, data = self._try_get_data()
File "/opt/conda/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1011, in _try_get_data
data = self._data_queue.get(timeout=timeout)
File "/opt/conda/lib/python3.8/multiprocessing/queues.py", line 116, in get
return _ForkingPickler.loads(res)
File "/opt/conda/lib/python3.8/site-packages/torch/multiprocessing/reductions.py", line 295, in rebuild_storage_fd
fd = df.detach()
File "/opt/conda/lib/python3.8/multiprocessing/resource_sharer.py", line 58, in detach
return reduction.recv_handle(conn)
File "/opt/conda/lib/python3.8/multiprocessing/reduction.py", line 189, in recv_handle
return recvfds(s, 1)[0]
File "/opt/conda/lib/python3.8/multiprocessing/reduction.py", line 159, in recvfds
raise EOFError
EOFError