Batch size in distributed data parallel

I am little confused that the batchsize of distributeddataparallel. Each machine has a process, and the dataloader is to load data with specific batch size. As is in this line[], the distributed data parallel will handle its own index of dataset according to the rank.

So, does that mean if I set the batchsize to 64, then loss will will be the 128 samples’s average? Please correct me if there is something with my understanding.