In ensure_shared_grads, wouldn’t shared_param.grad NOT be None after the first time the parameters are updated? In my testing it seems that the function returns without syncing the gradient in every step after the first. Related discussion here.
def ensure_shared_grads(model, shared_model): for param, shared_param in zip(model.parameters(), shared_model.parameters()): if shared_param.grad is not None: return shared_param._grad = param.grad
In DM’s paper they say they perform async updates without locks. What does it mean?
what will happen if
shared_param.grad is changed by another thread after returning from
Will the gradient of the current thread be discarded?