Mnist_hogwild with cuda not working with model = Net().cuda()

I’d like to convert mnist_hogwild to cuda version.

So, I tried change this line ( https://github.com/pytorch/examples/blob/master/mnist_hogwild/main.py#L52 ) to
model = Net().cuda()

Also, I added multiprocessing.set_start_method(‘spawn’)

By the way, I got this error when I execute the code (before subprocesses start.)

THCudaCheck FAIL file=/Users/qbx2/pytorch/torch/csrc/generic/StorageSharing.cpp line=249 error=63 : OS call failed or operation not supported on this OS
Traceback (most recent call last):
File “main.py”, line 63, in
p.start()
File “/Users/qbx2/anaconda3/lib/python3.6/multiprocessing/process.py”, line 105, in start
self._popen = self._Popen(self)
File “/Users/qbx2/anaconda3/lib/python3.6/multiprocessing/context.py”, line 223, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File “/Users/qbx2/anaconda3/lib/python3.6/multiprocessing/context.py”, line 284, in _Popen
return Popen(process_obj)
File “/Users/qbx2/anaconda3/lib/python3.6/multiprocessing/popen_spawn_posix.py”, line 32, in init
super().init(process_obj)
File “/Users/qbx2/anaconda3/lib/python3.6/multiprocessing/popen_fork.py”, line 20, in init
self._launch(process_obj)
File “/Users/qbx2/anaconda3/lib/python3.6/multiprocessing/popen_spawn_posix.py”, line 47, in _launch
reduction.dump(process_obj, fp)
File “/Users/qbx2/anaconda3/lib/python3.6/multiprocessing/reduction.py”, line 60, in dump
ForkingPickler(file, protocol).dump(obj)
File “/Users/qbx2/anaconda3/lib/python3.6/site-packages/torch/multiprocessing/reductions.py”, line 104, in reduce_storage
metadata = storage.share_cuda()
RuntimeError: cuda runtime error (63) : OS call failed or operation not supported on this OS at /Users/qbx2/pytorch/torch/csrc/generic/StorageSharing.cpp:249

To resolve this issue, I changed the code to call model.cuda() on subprocesses in train() method, and it works fine. But what is this error? Is it prohibited to pass cuda model to subprocess? It doesn’t make sense. I’m using macOS Sierra (10.12.5), python 3.6.1, and cuda 8.0.

Thank you.

OSX and CUDA multiprocessing has some restrictions as far as I know.

Hogwild training is not made to run on gpu as with cuda the tensors have locks sets in place in most likely will end up in model parameters shared in corrupted state which I don’t think cuda will allow. You could set up a pool of process that will organize them for you and should run on gpu but thats not hogwild training if your looking for that

1 Like

You can refer to pytorch documentation for more info here: http://pytorch.org/docs/master/notes/multiprocessing.html

So cuda tensors have lock that hogwild! cannot be used with them? And cpu tensors don’t have locks? Didn’t know about it.

Yes hogwild training is a special lock free approach to training that exploits some of the benefits of a multipurpose CPU when the time taken for locks have become a bottleneck for certain model training

1 Like

Yes, I just didn’t know that cuda tensors have locks. Thank you very much!

By the way, which doc says cuda tensors have locks?

oh ok. so its my understanding that only simple atomic operations can be done without locks with cuda but many parameter updating operations requires some locking structure. So cuda tensors don’t require in general but for your needs they would. Sorry should not have wrote in absolute terms. I would see if you can ask someone from Nvidia on more information on this as they may have more info or may have a way to do it. And also may have some docs for you.

Ah, I thought all pytorch’s cuda tensors have locks, but just couldn’t find any docs related to it. If not, I’m okay with it. Thank you very much !!