There are actually two issues here - one is that mp.Manager().Queue() has a different behavior than mp.Queue() in that it throws an invalid device pointer (regardless of the fix below). As I debugged a bit more, it seems that it is using the correct ForkingPickler from torch.multiprocessing, so the reason why it fails is not obvious to me. 
The second issue which arises when the worker finishes early, exists for mp.Queue() too, and I was able to use this suggestion from @colesbury to resolve it - i.e. using an mp.Event() to keep the worker process alive until all tensors are fetched in the main process.