Dose SyncBN with DDP support different data size in GPUs

wbhu · November 7, 2019, 8:54am

Hi,

I am working with a scripts that each training instance has different mumber of points, so when I applied DDP with syncBN, how will the BN stats in different GPUs to sync? I have traced the source code to here “https://github.com/pytorch/pytorch/blob/a4a5b6fcaae26fe241d32a7c4b2091ee69b600bb/torch/nn/modules/_functions.py” L33-L43

        # calcualte global mean & invstd
        mean, invstd = torch.batch_norm_gather_stats_with_counts(
            input,
            mean_all,
            invstd_all,
            running_mean,
            running_var,
            momentum,
            eps,
            count_all.view(-1).long().tolist()
        )

In my case the “mean_all” and “invstd_all” should be weighted average accroding to different “counts” in GPUs, is it the actual situation?

BTW, the syncBN in NVIDIA apex just simply average “mean_all” and “invstd_all” which not support for different counts in GPUs.

Thanks very much

elmirador · April 18, 2020, 6:41am

In my case the “mean_all” and “invstd_all” should be weighted average according to different “counts” in GPUs, is it the actual situation?

I think you’re right.

torch.batch_norm_gather_stats_with_counts leads to aten\src\ATen\native\cuda\Normalization.cuh.

The function you’re finding is batch_norm_reduce_statistics_kernel.
In the loop starts at L405, you can see that all statistics are calculated w.r.t. their own count.

 for (int j = 0; j < world_size; j++) {
  scalar_t count = counts[j];
  accscalar_t m = vec_mean[j][i];
  accscalar_t v = accscalar_t(1.0) / (vec_invstd[j][i]);
  v = (v * v - epsilon) * count;
  accscalar_t factor = 1.0 / (n + count);
  var_n += v + (avg - m) * (avg - m) * n * count * factor;
  avg = n * factor * avg + count * factor * m;
  n += count;
}