"shared_cache" in rebuild_cuda_tensor seems to have trouble

Hello, I’m trying to share a same Tensor between two Processes.
In this case, the Tensor will be allocated in shared cache pool to be able to get accessed from both.

import torch
import torch.multiprocessing as mp

def sub_process(queue):
    tensor = queue.get()
    # do something...
    tensor = None

    tensor = queue.get()
    # do something...
    tensor = None

if __name__ == '__main__':
    mp.set_start_method('spawn')
    recv_queue = mp.Queue()
    t1 = torch.rand(2, 2, device="cuda")
    p = mp.Process(name="sub_process",target=sub_process, args=(recv_queue, ))
    p.start()

    recv_queue.put(t1)
    # update t1 ... 
    recv_queue.put(t1)
    p.join()

What I expected was, when I call the second queue.get() function in sub_process, it would take far less time than the first queue.get() since it will allocate the tensor in cached memory pool using storage_from_cache.

However, I found there exists a cached storage in shared_cache (line 298 in reductions.py), but when _new_with_weak_ptr is called to use the storage, it returns None.

The storage cache works as expected only when I control the object manually without calling tensor = None.

Should I handle all objects one by one to use the shared cache pool, or there is a better way?


While debugging, I came up with another question.

It seems like Senders also make the Storage Cache,
I don’t think I’m understanding the purpose of that code, since I could not find where it actually uses that Storage Cache; all uses Storage Cache from Receivers.