How to reuse shm tensor in dataloader

I use torch’s dataloader with num_workers > 0. A single batch for my problem is very large (~100GB) and consumes a lot of CPU memory. These batches have the same size.

From my understanding, if I return my batch in my dataset’s __getitem__(), it will first be sent to shm by the loader worker process, then the handle is sent to the main process. However there are two problems:

  1. During sending the worker process local tensor to shm, the high watermark memory consumption is double the tensor size;
  2. The worker has to allocate for each batch a large chunk of contiugous memory, then allocate another chunk in shm, then free the process local memory chunk, only to reallocate the same size for the next batch. And the shm memory chunk is eventually freed by the main process, and needs to be allocated by the worker process again.

So my problem is, is there a way to reuse the shm tensor so that I can:

  1. populate directly the shm tensor in the worker process instead of allocating a worker process local tensor first;
  2. reuse the shm tensor between worker process and main process instead of allocating and freeing it repeatedly.