DistributedDataParallelCPU allreduce implementation

In the implementation of DistributedDataParallelCPU, looks like we setup the all reduce hook or every layer of the model, but we all reduce whole model grads everytime the allreduce_params() get triggered. My understand is we should do allreduce once an iteration. seems we are doing multiple times in DistributedDataParallelCPU? Did I missed anything?