Share device-allocated tensor with multiple processes / threads

Is there a way to share device-allocated tensor with multiple processes / threads s.t. all processes / threads would read form the same memory region instead of having its own copy of the tensor? I know that, using CUDA IPC API, you can share array allocated using cudaMalloc with multiple processes, so this could be possible in pytorch.