Is there a reason why we are using:
self.shadow[name] = new_average.clone()
instead of
self.shadow[name] = new_average.detach().clone()
since we really don’t care about propagating the gradients to the shadow copies of the parameters.
Is there a reason why we are using:
self.shadow[name] = new_average.clone()
instead of
self.shadow[name] = new_average.detach().clone()
since we really don’t care about propagating the gradients to the shadow copies of the parameters.