What is the running mean of BatchNorm if gradients are accumulated?

  1. Yes.
  2. Accumulated gradients will be the same if you divide them by the number of iterations. I referred below.
1 Like