How to share a gpu tensor between mulit-process

my raw GPU tensor is so big,can’t afford to copy that in multiprocessing.
And I need to use multiproceesing,cuz I have to do calculation in both cpu/gpu.
is there any way to do that?should I use libtorch or cupy or something else?
Here is my code:

import torch
import time


def task(o):
    print(o.data_ptr())
    print(o, '034')
    stream = torch.cuda.Stream()
    with torch.cuda.stream(stream):
        print(o+o)


if __name__ == '__main__':
    s_mp = []
    o = torch.randn((5000, 5000)).pin_memory().cuda()
    torch.multiprocessing.set_start_method('spawn')
    print(o.data_ptr(), 3434)
    for i in range(4):
        s_mp.append(torch.multiprocessing.Process(target=task, args=(o,)))
    for mp in s_mp:
        mp.start()
    for mp in s_mp:
        mp.join()

thanks for your time to read this