I’d like to convert mnist_hogwild to cuda version.
So, I tried change this line ( https://github.com/pytorch/examples/blob/master/mnist_hogwild/main.py#L52 ) to
model = Net().cuda()
Also, I added multiprocessing.set_start_method(‘spawn’)
By the way, I got this error when I execute the code (before subprocesses start.)
THCudaCheck FAIL file=/Users/qbx2/pytorch/torch/csrc/generic/StorageSharing.cpp line=249 error=63 : OS call failed or operation not supported on this OS
Traceback (most recent call last):
File “main.py”, line 63, in
p.start()
File “/Users/qbx2/anaconda3/lib/python3.6/multiprocessing/process.py”, line 105, in start
self._popen = self._Popen(self)
File “/Users/qbx2/anaconda3/lib/python3.6/multiprocessing/context.py”, line 223, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File “/Users/qbx2/anaconda3/lib/python3.6/multiprocessing/context.py”, line 284, in _Popen
return Popen(process_obj)
File “/Users/qbx2/anaconda3/lib/python3.6/multiprocessing/popen_spawn_posix.py”, line 32, in init
super().init(process_obj)
File “/Users/qbx2/anaconda3/lib/python3.6/multiprocessing/popen_fork.py”, line 20, in init
self._launch(process_obj)
File “/Users/qbx2/anaconda3/lib/python3.6/multiprocessing/popen_spawn_posix.py”, line 47, in _launch
reduction.dump(process_obj, fp)
File “/Users/qbx2/anaconda3/lib/python3.6/multiprocessing/reduction.py”, line 60, in dump
ForkingPickler(file, protocol).dump(obj)
File “/Users/qbx2/anaconda3/lib/python3.6/site-packages/torch/multiprocessing/reductions.py”, line 104, in reduce_storage
metadata = storage.share_cuda()
RuntimeError: cuda runtime error (63) : OS call failed or operation not supported on this OS at /Users/qbx2/pytorch/torch/csrc/generic/StorageSharing.cpp:249
To resolve this issue, I changed the code to call model.cuda() on subprocesses in train() method, and it works fine. But what is this error? Is it prohibited to pass cuda model to subprocess? It doesn’t make sense. I’m using macOS Sierra (10.12.5), python 3.6.1, and cuda 8.0.
Thank you.