Unable to open shared memory object </torch_3500_2599739126>

Hi, I’ve met the following error many times, could anyone tell me any possible reasons? Thanks so much.

Traceback (most recent call last):
File “/home/zhou_rui/anaconda3/lib/python3.6/multiprocessing/process.py”, line 249, in _bootstrap
File “/home/zhou_rui/anaconda3/lib/python3.6/multiprocessing/process.py”, line 93, in run
File “/home/zhou_rui/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py”, line 44, in _worker_loop
File “/home/zhou_rui/anaconda3/lib/python3.6/multiprocessing/queues.py”, line 349, in put
File “/home/zhou_rui/anaconda3/lib/python3.6/multiprocessing/reduction.py”, line 51, in dumps
File “/home/zhou_rui/anaconda3/lib/python3.6/site-packages/torch/multiprocessing/reductions.py”, line 113, in reduce_storage
RuntimeError: unable to open shared memory object </torch_3500_2599739126> in read-write mode at /opt/conda/conda-bld/pytorch_1502009910772/work/torch/lib/TH/THAllocator.c:230

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “/home/zhou_rui/anaconda3/lib/python3.6/multiprocessing/util.py”, line 254, in _run_finalizers
File “/home/zhou_rui/anaconda3/lib/python3.6/multiprocessing/util.py”, line 186, in call
File “/home/zhou_rui/anaconda3/lib/python3.6/shutil.py”, line 476, in rmtree
File “/home/zhou_rui/anaconda3/lib/python3.6/shutil.py”, line 474, in rmtree
OSError: [Errno 24] Too many open files: '/tmp/pymp-6ll9wgxr’
Process Process-11:
Traceback (most recent call last):
File “/home/zhou_rui/anaconda3/lib/python3.6/multiprocessing/process.py”, line 249, in _bootstrap
File “/home/zhou_rui/anaconda3/lib/python3.6/multiprocessing/process.py”, line 93, in run
File “/home/zhou_rui/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py”, line 44, in _worker_loop
File “/home/zhou_rui/anaconda3/lib/python3.6/multiprocessing/queues.py”, line 349, in put
File “/home/zhou_rui/anaconda3/lib/python3.6/multiprocessing/reduction.py”, line 51, in dumps
File “/home/zhou_rui/anaconda3/lib/python3.6/site-packages/torch/multiprocessing/reductions.py”, line 113, in reduce_storage
RuntimeError: unable to open shared memory object </torch_3500_2599739126> in read-write mode at /opt/conda/conda-bld/pytorch_1502009910772/work/torch/lib/TH/THAllocator.c:230
[]
iter: 2905 Time 0.896 Data 0.013 Loss 4.7818 RPN 3.1798 2.1379 0.1042 ODN 1.6020 1.5693 0.0072 0.0256
[‘bottle’ ‘bottle’ ‘bottle’ ‘bottle’]
T
iter: 2906 Time 0.876 Data 0.002 Loss 2.4086 RPN 1.4914 0.2360 0.1255 ODN 0.9172 0.8927 0.0000 0.0244
[]
iter: 2907 Time 0.816 Data 0.003 Loss 3.5380 RPN 2.0609 0.3948 0.1666 ODN 1.4772 1.4434 0.0126 0.0212
[‘vase’]
T
iter: 2908 Time 0.887 Data 0.005 Loss 1.9561 RPN 1.2698 0.3745 0.0895 ODN 0.6863 0.6636 0.0004 0.0223
Traceback (most recent call last):
File “main.py”, line 522, in
main()
File “main.py”, line 294, in main
i, args.eval_freq)
File “main.py”, line 346, in train
for i, (inputs, anns,paths) in enumerate(train_loader):
File “/home/zhou_rui/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py”, line 195, in next
idx, batch = self.data_queue.get()
File “/home/zhou_rui/anaconda3/lib/python3.6/multiprocessing/queues.py”, line 345, in get
return _ForkingPickler.loads(res)
File “/home/zhou_rui/anaconda3/lib/python3.6/site-packages/torch/multiprocessing/reductions.py”, line 70, in rebuild_storage_fd
fd = df.detach()
File “/home/zhou_rui/anaconda3/lib/python3.6/multiprocessing/resource_sharer.py”, line 57, in detach
with _resource_sharer.get_connection(self._id) as conn:
File “/home/zhou_rui/anaconda3/lib/python3.6/multiprocessing/resource_sharer.py”, line 87, in get_connection
c = Client(address, authkey=process.current_process().authkey)
File “/home/zhou_rui/anaconda3/lib/python3.6/multiprocessing/connection.py”, line 487, in Client
c = SocketClient(address)
File “/home/zhou_rui/anaconda3/lib/python3.6/multiprocessing/connection.py”, line 614, in SocketClient
s.connect(address)
FileNotFoundError: [Errno 2] No such file or directory

Run:
ulimit -a

and then:
sudo sysctl -w fs.file-max=100000

1 Like

Hi thanks for your reply. That didn’t work for me. I suspect the limit on the number of open files per process could be the cause. What would be a reasonable number for that limit? Thank you!