Two processes: inference and backward running at the same time?

My goal is to achieve real-time inference and backward in PyTorch.

Suppose I have a stream of input tensors arriving at different times. I created two torch.multiprocessing.Process called inference_process and backward_process.
I would like to make these two processes run simultaneously: after inference_process forwards an input tensor, it will send the output tensor to the backward_process. The backward_process will compute the gradient, while the inference_process continues to forward the next tensor.

In inference_process, I want to do this:
output_buffer = my_model(input_buffer)

And in backward_process, I want to get the output buffer from the inference_process:
output_buffer = output_buffer_stream.get()

output_buffer_stream is a torch.multiprocessing.Queue()

However, I get the following error at output_buffer_stream.put(output_buffer):
RuntimeError: Cowardly refusing to serialize non-leaf tensor which requires_grad, since autograd does not support crossing process boundaries. If you just want to transfer the data, call detach() on the tensor before serializing (e.g., putting it on the queue).

Any ideas on how to solve it? How can I share a tensor with gradient between two processes? Should I try torch.distributed?

I suspect the reason is that the computation graph(.grad_fn) cannot be serialized. So tensor with a computation graph cannot share between processes.