GPU 0 memory usage in DistributedDataParallel

The amount of memory used on GPU 0 seems to go up linearly with number of GPUs for my model.
For DDP with 1 GPU, the memory usage is 8 GB, with 8 GPUs is 12 GB.
I am not computing any particular values just on GPU 0, and the whole model is inside DDP.

Is this normal or expected?