Which one to use? Dataparallel vs DistributedDataParallel

I have a machine with 32v CPUs and 4 GPUs. I need to train an imagenet model from scratch. So which of these Dataparallel or DistributedDataParallel would be faster to have high utilization of 4 GPUs?
DistributedDataParallel is more well-liked even on a single machine when there is benefit in also parallelizing the CPU overhead of the model.

