Data distribution across nodes on a cluster when training with DDP

Subhash · August 17, 2022, 1:56pm

Hi Everyone,

I have a question regarding the distribution of data samples across a multi-node GPU cluster, when training using DDP.

Do the data subsets assigned to the nodes remain same across iterations(epochs)?

Thanks
Subhash

wanchaol · August 23, 2022, 4:37am

Logically when you training with DDP, the data subsets come to each node/rank are different even within the same epoch, but they are together forming a global batch of data inputs.