Does pytorch.multiprocessing require more GPU memory?

my training code takes up about 9GB and I have 2080Ti which has 11GB of memory

when I am running this on a single GPU,
the process is pretty stable, as the GPU usage stays under 10GB

However, when I add extra 2080Ti device and use torch.multiprocessing for parallel training,
I have noticed that it takes up little more GPU and rarely cause out of memory issue.

I am wondering if the increase in GPU memory usage is expected.
The documentation says that when sharing CUDA memory, the original tensor will exist until it’s moved to the other one (Multiprocessing package - torch.multiprocessing — PyTorch 1.8.1 documentation).
Is this why multiprocessing case consumes little more memory?