I try to speed up a multidirectional RNN with
torch.multiprocessing as I don’t get it to efficiently run on the GPU, but I have access to a lot of CPUs on a cluster. The one point where I want to apply multiprocessing is the for-loop over the different directions, as they are computed completely independent and only after all iterations are finished, the results are combined in a final layer.
So instead of
results =  for d in directions: results.append(self.forward_direction(batch, param1, param2))
i tried doing
pool = mp.Pool(processes=self.num_directions) results = pool.starmap(self.forward_direction, (batch, param1, param2))
This results in
MaybeEncodingError: Error sending result: '[tensor([...], grad_fn=<StackBackward>)]'. Reason: 'RuntimeError('Cowardly refusing to serialize non-leaf tensor which requires_grad, since autograd does not support crossing process boundaries. If you just want to transfer the data, call detach() on the tensor before serializing (e.g., putting it on the queue).',)'
Did not find a solution yet, is it even possible to multiprocess the for-loop without breaking autograd?