While doing some multiprocessing, I ran into this error :
Traceback (most recent call last): File "/usr/lib/python3.5/multiprocessing/queues.py", line 241, in _feed obj = ForkingPickler.dumps(obj) File "/usr/lib/python3.5/multiprocessing/reduction.py", line 50, in dumps cls(buf, protocol).dump(obj) File "/home/dylan/.virtualenvs/testGo/lib/python3.5/site-packages/torch/multiprocessing/reductions.py", line 104, in reduce_storage metadata = storage._share_cuda_() RuntimeError: invalid device pointer: 0x204bc6a00 at /home/dylan/Desktop/superGo/pytorch/aten/src/THC/THCCachingAllocator.cpp:259
The context of this error is the following :
I launched my training in another process (works fine), initializing the model (player) and training a deepcopy of it (new_player) on the same newly launched process. At some point during training I want to asynchronously launch a new process to evaluate the models against each other like this :
pool = MyPool(1) pool.apply_async(evaluate, args=(player, new_player,), callback=new_agent)
(MyPool is an extension of the class Pool from multiprocessing.Pool with the daemon set to False)
It seems that the parameters of the models can’t get copied to the new process for some reason!
Any idea on how to fix this ?
I’m using Python 3.5.2 and PyTorch from source version 0.4.0a0+d93d41b