Is there a way to synchronize the BatchNorm layer across different GPU to calculate mean and variance during the training?
I figured it out.
Sync mean and variance for forward pass.
Sync gradMean and Variance during the backward pass.
I am facing the same issue and would like to know your solution, but the code is not found on the link you mentioned. Could you please update the link with the code to solve this issue?