I am little confused that the batchsize of distributeddataparallel. Each machine has a process, and the dataloader is to load data with specific batch size. As is in this line[https://github.com/pytorch/pytorch/blob/master/torch/utils/data/distributed.py#L49], the distributed data parallel will handle its own index of dataset according to the rank.
So, does that mean if I set the batchsize to 64, then loss will will be the 128 samples’s average? Please correct me if there is something with my understanding.