I have tried nohup, tmux and screen all three, but as soon as I logout of the remote machine, PyTorch dataloader dies. Here’s the error stack :-
Traceback (most recent call last):
File "miika_method_construction_only.py", line 333, in <module>
num_epochs=NUM_EPOCHS)
File "miika_method_construction_only.py", line 158, in train_model
for data in dset_loaders[phase]:
File "/afs/csail.mit.edu/u/s/smadan/miniconda3/envs/py_36_tens_gpu/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 336, in __next__
File "/afs/csail.mit.edu/u/s/smadan/miniconda3/envs/py_36_tens_gpu/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 357, in _process_next_batch
PermissionError: Traceback (most recent call last):
File "/afs/csail.mit.edu/u/s/smadan/miniconda3/envs/py_36_tens_gpu/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 106, in _worker_loop
File "/afs/csail.mit.edu/u/s/smadan/miniconda3/envs/py_36_tens_gpu/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 106, in <listcomp>
File "/data/graphics/SpandanGraphsProject/Spandan_Experiments/Bayesian_Tool_Learning/neural_net/parameter_loader_all_tools.py", line 208, in __getitem__
sample = self.loader(path)
File "/data/graphics/SpandanGraphsProject/Spandan_Experiments/Bayesian_Tool_Learning/neural_net/parameter_loader_all_tools.py", line 257, in default_loader
return pil_loader(path)
File "/data/graphics/SpandanGraphsProject/Spandan_Experiments/Bayesian_Tool_Learning/neural_net/parameter_loader_all_tools.py", line 239, in pil_loader
img = Image.open(f)
File "/afs/csail.mit.edu/u/s/smadan/miniconda3/envs/py_36_tens_gpu/lib/python3.6/site-packages/PIL/Image.py", line 2591, in open
File "/afs/csail.mit.edu/u/s/smadan/miniconda3/envs/py_36_tens_gpu/lib/python3.6/site-packages/PIL/Image.py", line 378, in preinit
File "<frozen importlib._bootstrap>", line 971, in _find_and_load
File "<frozen importlib._bootstrap>", line 955, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 665, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 674, in exec_module
File "<frozen importlib._bootstrap_external>", line 780, in get_code
File "<frozen importlib._bootstrap_external>", line 832, in get_data
PermissionError: [Errno 13] Permission denied: '/afs/csail.mit.edu/u/s/smadan/miniconda3/envs/py_36_tens_gpu/lib/python3.6/site-packages/PIL/BmpImagePlugin.py'
I made sure that all the files in python3.6 (and recursively all children) have read and write permissions using chmod -R 777.
Happening with a conda environment. PyTorch 0.4.1. Any leads at all?
Thanks,
Spandan