Num_workers in dataloader always gives this error

Whenever I try to use num_workers with a dataloader I always get this error

OSError: [Errno 16] Device or resource busy: '.nfs00000000087c7062000001de'

I tried googling for it but nothing came up which is always scary. Below is the full stacktrace.

Traceback (most recent call last):
  File "/st2/jeff/anaconda3/envs/jeff/lib/python3.7/multiprocessing/util.py", line 277, in _run_finalizers                          
    finalizer()
  File "/st2/jeff/anaconda3/envs/jeff/lib/python3.7/multiprocessing/util.py", line 201, in __call__                                 
    res = self._callback(*self._args, **self._kwargs)
  File "/st2/jeff/anaconda3/envs/jeff/lib/python3.7/multiprocessing/util.py", line 277, in _run_finalizers                          
    finalizer()
  File "/st2/jeff/anaconda3/envs/jeff/lib/python3.7/multiprocessing/util.py", line 110, in _remove_temp_dir                         
    rmtree(tempdir)
  File "/st2/jeff/anaconda3/envs/jeff/lib/python3.7/multiprocessing/util.py", line 201, in __call__                                 
    res = self._callback(*self._args, **self._kwargs)
  File "/st2/jeff/anaconda3/envs/jeff/lib/python3.7/shutil.py", line 494, in rmtree                                                 
    _rmtree_safe_fd(fd, path, onerror)
  File "/st2/jeff/anaconda3/envs/jeff/lib/python3.7/multiprocessing/util.py", line 110, in _remove_temp_dir                         
    rmtree(tempdir)
  File "/st2/jeff/anaconda3/envs/jeff/lib/python3.7/shutil.py", line 452, in _rmtree_safe_fd                                        
    onerror(os.unlink, fullname, sys.exc_info())
  File "/st2/jeff/anaconda3/envs/jeff/lib/python3.7/shutil.py", line 494, in rmtree                                                 
    _rmtree_safe_fd(fd, path, onerror)
  File "/st2/jeff/anaconda3/envs/jeff/lib/python3.7/shutil.py", line 450, in _rmtree_safe_fd                                        
    os.unlink(entry.name, dir_fd=topfd)
  File "/st2/jeff/anaconda3/envs/jeff/lib/python3.7/shutil.py", line 452, in _rmtree_safe_fd                                        
    onerror(os.unlink, fullname, sys.exc_info())
  File "/st2/jeff/anaconda3/envs/jeff/lib/python3.7/shutil.py", line 450, in _rmtree_safe_fd                                        
    os.unlink(entry.name, dir_fd=topfd)
OSError: [Errno 16] Device or resource busy: '.nfs00000000087c7066000001db'                                                         
OSError: [Errno 16] Device or resource busy: '.nfs00000000087c7064000001dc'                                                         
Traceback (most recent call last):
  File "/st2/jeff/anaconda3/envs/jeff/lib/python3.7/multiprocessing/util.py", line 277, in _run_finalizers                          
    finalizer()
  File "/st2/jeff/anaconda3/envs/jeff/lib/python3.7/multiprocessing/util.py", line 201, in __call__                                 
    res = self._callback(*self._args, **self._kwargs)
  File "/st2/jeff/anaconda3/envs/jeff/lib/python3.7/multiprocessing/util.py", line 110, in _remove_temp_dir                         
    rmtree(tempdir)
  File "/st2/jeff/anaconda3/envs/jeff/lib/python3.7/shutil.py", line 494, in rmtree                                                 
    _rmtree_safe_fd(fd, path, onerror)
  File "/st2/jeff/anaconda3/envs/jeff/lib/python3.7/shutil.py", line 452, in _rmtree_safe_fd                                        
    onerror(os.unlink, fullname, sys.exc_info())
  File "/st2/jeff/anaconda3/envs/jeff/lib/python3.7/shutil.py", line 450, in _rmtree_safe_fd                                        
    os.unlink(entry.name, dir_fd=topfd)
OSError: [Errno 16] Device or resource busy: '.nfs00000000087c7055000001dd'                                                         
Traceback (most recent call last):
  File "/st2/jeff/anaconda3/envs/jeff/lib/python3.7/multiprocessing/util.py", line 277, in _run_finalizers                          
    finalizer()
  File "/st2/jeff/anaconda3/envs/jeff/lib/python3.7/multiprocessing/util.py", line 201, in __call__                                 
    res = self._callback(*self._args, **self._kwargs)
  File "/st2/jeff/anaconda3/envs/jeff/lib/python3.7/multiprocessing/util.py", line 110, in _remove_temp_dir                         
    rmtree(tempdir)
  File "/st2/jeff/anaconda3/envs/jeff/lib/python3.7/shutil.py", line 494, in rmtree                                                 
    _rmtree_safe_fd(fd, path, onerror)
  File "/st2/jeff/anaconda3/envs/jeff/lib/python3.7/shutil.py", line 452, in _rmtree_safe_fd                                        
    onerror(os.unlink, fullname, sys.exc_info())
  File "/st2/jeff/anaconda3/envs/jeff/lib/python3.7/shutil.py", line 450, in _rmtree_safe_fd                                        
    os.unlink(entry.name, dir_fd=topfd)
OSError: [Errno 16] Device or resource busy: '.nfs00000000087c7062000001de'    

Are you using a network file system? If so, could you try to use the local drive?
Based on the error message, my best guess is that your NFS has too many open files, which conflicts with the creation of shared memory.

1 Like

9 months later and I finally figured out what this was. In one of my environments there was some issue with the /home folder drive being full and I got tired of getting errors so I moved the tmp folder to another drive that was a nfs filesystem and caused this error.

4 Likes

I fixed it with unset TMPDIR.

1 Like

Hi,
I recently encountered the same issue. I removed the num_worker option. Can you elaborate the issue?
For me the following code caused the error:

train_loader = DataLoader(train_dataset, shuffle=True, pin_memory=True, batch_size=16, num_workers=1)
for epoch in range(3):
	pbar = tqdm(train_loader)
	for model_inputs, labels in pbar:
		model_inputs = model_inputs.to(device)

Is there a locking mechanism in place that causes the issue?