When I train with DistributedDataParallel do I get the functionality of DataParallel, meaning can I assume that on a single node if there is more than one GPU then all GPUs will be utilized on that node?
Yep, DistributedDataParallel (DDP) can utilize multiple GPUs on the same node, but it works differently than DataParallel (DP). DDP uses multiple processes, one process per GPU, while DP is single-process multi-thread.