About DistributedDataParallel, I am confused

I have two GPU cards, both 12 G.
During my training, batchsize is 1, the memory consumption of my model is about 7 G when on a single GPU card.
Now I use DDP to run my model on two GPU cards, I expect the memory consumption on 2 GPUs both are 3.5G, but in the fact, both of them are 7 G.
it’s normal? How should I do to achieve that?

If a single sample uses approx. 7GB on one GPU, it would be the min. memory usage you could get using DDP, since the GPUs cannot process “half samples”.
Given that both devices use 7GB, I assume that both are processing a sample in your script.