Too many open files when using dataLoader

Hi,
When I use the data loader, I have met the following error: Too many open files.
In my implementation of the Dataset, I use torch.load(‘xxx’) to load the data files (which are all tensors stored on disks), and when call getitem(self, index), it will take the corresponding items from the tensor, and return it.

I construct the dalaloader in the following manner:

dataset = Dataset(xxxxx)
dataLoader = torch.utils.data.DataLoader(dataset=dataset, batch_size=batchSize, shuffle=shuffle,
num_workers = numWorkers, collate_fn=dataset.collate_fn)

The details of the error are in below:

0: referTypes, nNodes, featureIds, sIds) in enumerate(dataloader):
0: File “/u/home/home/anaconda2/lib/python2.7/site-packages/torch/utils/data/dataloader.py”, line 195, in next
0: idx, batch = self.data_queue.get()
0: File “/u/home/home/anaconda2/lib/python2.7/multiprocessing/queues.py”, line 378, in get
0: return recv()
0: File “/u/home/home/anaconda2/lib/python2.7/site-packages/torch/multiprocessing/queue.py”, line 22, in recv
0: return pickle.loads(buf)
0: File “/u/home/home/anaconda2/lib/python2.7/pickle.py”, line 1388, in loads
0: return Unpickler(file).load()
0: File “/u/home/home/anaconda2/lib/python2.7/pickle.py”, line 864, in load
0: dispatchkey
0: File “/u/home/home/anaconda2/lib/python2.7/pickle.py”, line 1139, in load_reduce
0: value = func(*args)
0: File “/u/home/home/anaconda2/lib/python2.7/site-packages/torch/multiprocessing/reductions.py”, line 68, in rebuild_storage_fd
0: fd = multiprocessing.reduction.rebuild_handle(df)
0: File “/u/home/home/anaconda2/lib/python2.7/multiprocessing/reduction.py”, line 155, in rebuild_handle
0: conn = Client(address, authkey=current_process().authkey)
0: File “/u/home/home/anaconda2/lib/python2.7/multiprocessing/connection.py”, line 169, in Client
0: c = SocketClient(address)
0: File “/u/home/home/anaconda2/lib/python2.7/multiprocessing/connection.py”, line 320, in SocketClient
0: fd = duplicate(s.fileno())
0: OSError: [Errno 24] Too many open files

Thanks!

1 Like

Well, if could be that you are hitting your system’s limit. A lot of things counts as files on unix-based systems. Try rebooting. If that doesn’t work, you can try raise the limit as described at https://stackoverflow.com/questions/39537731/errno-24-too-many-open-files-but-i-am-not-opening-files

Thanks for the reply!
In fact, I have already tried to set ulimit -100000…
while the error still occurs sometimes.

1 Like

I am facing the same error, I have rebooted my ubuntu server as well as tried setting ulimit but I get ulimit: value exceeds hard limit when I try to set something very above 4096. Although, I get the output as unlimited when I run ulimit. Is there any other solution for this?

I used the instructions mentioned here to increase my hard limit previously set as a workaround.

I wonder if this would be helpful:

import torch.multiprocessing
torch.multiprocessing.set_sharing_strategy('file_system')

https://pytorch.org/docs/stable/multiprocessing.html#file-descriptor-file-descriptor