Using torch.Tensor over multiprocessing.Queue + Process fails

I had a related question that I posted here - CUDA tensors on multiprocessing queue.

In our application, we have a bunch of workers that are putting CUDA tensors onto a shared queue that is read by the main process. It seems that the workers need to keep the CUDA tensors in memory until the main process has read these tensors. There are two issues here - one, is that the main process actually runs fine without throwing any errors but the tensors read are often garbage values. It will be great if this would trigger an exception. Secondly, it does look like this constraint forces us to have some kind of communication also going backwards - from the main process to the worker processes. e.g. we could store the tensors in a temporary queue in the workers which would be cleared periodically based on information from the main process. Is this the best way to handle such a use case?