ImageNet example is crashing

FuriouslyCurious · March 26, 2017, 5:54am

Tried to run the ImageNet example (https://github.com/pytorch/examples/tree/master/imagenet) under Python3.5, and it is trying to write data to Soumith’s computer. I don’t think my Ethernet cable is long enough to reach San Francisco, so what are my other options?

Process Process-4:
Traceback (most recent call last):
  File "/conda3/envs/idp/lib/python3.5/multiprocessing/process.py", line 249, in _bootstrap
    self.run()
  File "/conda3/envs/idp/lib/python3.5/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/conda3/envs/idp/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 36, in _worker_loop
    data_queue.put((idx, samples))
  File "/conda3/envs/idp/lib/python3.5/multiprocessing/queues.py", line 349, in put
    obj = ForkingPickler.dumps(obj)
  File "/conda3/envs/idp/lib/python3.5/multiprocessing/reduction.py", line 50, in dumps
    cls(buf, protocol).dump(obj)
  File "/conda3/envs/idp/lib/python3.5/site-packages/torch/multiprocessing/reductions.py", line 113, in reduce_storage
    fd, size = storage._share_fd_()
RuntimeError: unable to write to file </torch_6487_1133870694> at /data/users/soumith/miniconda2/conda-bld/pytorch-cuda80-0.1.10_1488757768560/work/torch/lib/TH/THAllocator.c:267

tom · March 26, 2017, 8:34am

Hi,

this does not try to write to soumith’s computer, but the the source file location is embedded into the error message when the pytorch library is compiled. Apparently the conda builds are done in the directory /data/users/soumith/miniconda2/conda-bld.
If you compile pytorch from source, you get to have your own path there.

The path that it appears to try to write to is /torch_6487_1133870694 (at the filesystem root) and you would not want to have it there. I would suspect that you either don’t have TEMP not set or some other other path wrong. Maybe you have an empty path where you would want ‘.’ instead.

Best regards

Thomas

Nord786 · May 4, 2017, 12:23pm

I has this problem than use docker image - and solution was add flag “–ipc=host” for docker run
nvidia-docker run --rm -ti --ipc=host pytorch-cudnnv6
Full doc for docker image - https://github.com/pytorch/pytorch#docker-image

ml9951 · June 19, 2017, 10:43pm

This solved it for me. Thanks!