Torch multiproccesing

Hi Folks,

I’m having a strange issue that I think I already spent two days :). I guess it related to the example given.

x = queue.get()
x_clone = x.clone()
queue_2.put(x_clone)

So if tensor x on GPU, as far I understand, you need to clone it.

First question, if I have object A, which holds a reference for four tensors, what is the right way to clone an entire object with all tensors ? i.e do you need first move to CPU put the queue and then back to GPU ? Because if detach.clone while tensor on GPU the data always zeroes.

class A
self._observations → tensor
self._actions → tensor

 def clone(self, dst):
        dst._observations = self._observations.detach().clone().cpu()
        dst._actions = self._actions.detach().clone().cpu()

This one I use

lock network_lock:
  x = A()
  network.forward(x)
  new_x = A() 
  # a a new object.   x.clone() take new_x and detach() and move to cpu()
  # note that I've tried to detach() and clone() and different combination but can't get it work.
  # NOTE when device set to CPU everything is working
  queue.put(x.clone(new_x))

My second question is if a tensor on GPU what is right way to put object
A that hold bunch of tensor on GPU ?

 def clone(self, dst):
        dst._observations = self._observations.detach().clone()
        dst._actions = self._actions.detach().clone()

In the second example, I notice as soon as the object goes to a queue.
If a tensor on GPU data becomes zero, the value is zero when the consumer picks up from the queue.

Hi, this question seems to be more related to tensor copy methods. Perhaps it would get more attention if moved to the relevant sub forum? Thanks!

It might, but I’m not 100 sure because I tried a different combination. So I guess it is essential to understand how exactly Torch shares tensor between the various processes. Specifically, mp. Multiprocessing that torch overwrites. My guess it should somehow share pointer that own by CUDA and I’m not sure who provides synchronization access or pure delegate to python multiprocess.

[W CUDAGuardImpl.h:62] Warning: CUDA warning: invalid device ordinal (function uncheckedSetDevice)
[W CUDAGuardImpl.h:46] Warning: CUDA warning: driver shutting down (function uncheckedGetDevice)

cc @VitalyFedyunin @SimonW

Folks,

After days of debugging, I think it is just a bug in ForkedPickle I am basically moving code to RPC.

I think somewhere done the line. There is a bug in PIPE. Essentially it contains a tensor that I never put in the first place. And tensors have grad to it.

Just FYI