In particular,
- Is it that each GPU separately computes its own parameters for batch norm over minibatch allocated to it? or do they communicate with each other for computing those parameters?
- If GPUs are independently computing these parameters, then how do GPUs combine these parameters, say, during inference or evaluation mode, and when do they do it?