Error : RuntimeError: unable to write to file </torch_18693_1954506624> at /pytorch/torch/lib/TH/THAllocator.c:271
I have encounted this error When run pytorch code in ubuntu server.
when debuging the code, i found the error occured at DataLoader.
The dataset’s __getitem__ method returned (img, label), the img’s type is ndarray. and i also tried returning img Tensor but in that condition, the process is blocked.
The code run properly at local, but failed at server.
Are you using Docker?
I had a similar issue and had to add the --ipc=host flag.
Please note that PyTorch uses shared memory to share data between processes, so if torch multiprocessing is used (e.g. for multithreaded data loaders) the default shared memory segment size that container runs with is not enough, and you should increase shared memory size either with --ipc=host or --shm-size command line options to nvidia-docker run.
You might not have enough shared memory, so you could try to increase it on your system (or docker, if you are using it).
I would also recommend to update to the latest stable PyTorch version (1.5) just in case you are hitting an older bug.
If you are using multiple workers in your DataLoader, you could also try to set num_workers=0 for the sake of debugging.
Thanks~ I kill other process, only run this pytorch task, this problem dispears. The reason is my system does not have enough shared memory. Thanks for your reply~
I don’t know alternatives to shared memory for multiprocessing IPC.
The fallback would be to use the main thread as for the data loading via num_workers=0, but this would also reduce the performance.