tensor.share_memory_()
will move the tensor data to shared memory on the host so that it can be shared between multiple processes. It is a no-op for CUDA tensors as described in the docs. I don’t quite understand the “in a single GPU instead of multiple GPUs” as this type of shared memory is not used on the GPU (i.e. it’s not the CUDA kernel-level shared memory).
1 Like