Cuda invalid device pointer

Hi !

While doing some multiprocessing, I ran into this error :

Traceback (most recent call last):
  File "/usr/lib/python3.5/multiprocessing/queues.py", line 241, in _feed
    obj = ForkingPickler.dumps(obj)
  File "/usr/lib/python3.5/multiprocessing/reduction.py", line 50, in dumps
    cls(buf, protocol).dump(obj)
  File "/home/dylan/.virtualenvs/testGo/lib/python3.5/site-packages/torch/multiprocessing/reductions.py", line 104, in reduce_storage
    metadata = storage._share_cuda_()
RuntimeError: invalid device pointer: 0x204bc6a00 at /home/dylan/Desktop/superGo/pytorch/aten/src/THC/THCCachingAllocator.cpp:259

The context of this error is the following :
I launched my training in another process (works fine), initializing the model (player) and training a deepcopy of it (new_player) on the same newly launched process. At some point during training I want to asynchronously launch a new process to evaluate the models against each other like this :

pool = MyPool(1)
pool.apply_async(evaluate, args=(player, new_player,), callback=new_agent)

(MyPool is an extension of the class Pool from multiprocessing.Pool with the daemon set to False)

It seems that the parameters of the models can’t get copied to the new process for some reason!
Any idea on how to fix this ?

I’m using Python 3.5.2 and PyTorch from source version 0.4.0a0+d93d41b
Thanks !

1 Like

So I managed to reproduce the error following this code :

Also, I didn’t see that at first, but the copy is correctly sent to the second new_process but fails to get copied to the third ?

Is it solved now? Is it solved now? Is it solved now? Sorry for repeated questions.

1 Like