How to apply exponential moving average decay for variables?

schow · November 8, 2021, 4:08am

Is there a reason why we are using:

self.shadow[name] = new_average.clone()

instead of

self.shadow[name] = new_average.detach().clone()

since we really don’t care about propagating the gradients to the shadow copies of the parameters.