Questions about torch.multiprocessing

I’m trying to reduce the overheads of communicating between processes, and my idea is to utilize the shared nature of torch tensors. All the data that I need to send between processes is numerical and can be stored as tensors. I have a few questions:

  • if I send a collection of tensors through the Queue, will those be shared? Or are tensors only shared when they they are the only object being sent through the Queue? In the latter case, can I specify that I want each tensor in the collection to be shared manually?
  • if tensors are shared, then does that mean they don’t get pickled when sent though a Queue? What about a collection of tensors? Really, I would like to know if it is fast to send a collection of shared tensors between processes. Note: the shapes of the tensors are subject to change, so I can’t just put everything in one big tensor of fixed shape, unless I pad it.
  • how large is the overhead of copying data to shared memory and then sending it through a Queue versus not utilizing shared memory at all (i.e. just using python’s multiprocessing module)?