In ensure_shared_grads, wouldn’t shared_param.grad NOT be None after the first time the parameters are updated? In my testing it seems that the function returns without syncing the gradient in every step after the first. Related discussion here.
def ensure_shared_grads(model, shared_model):
for param, shared_param in zip(model.parameters(), shared_model.parameters()):
if shared_param.grad is not None:
shared_param._grad = param.grad
In DM’s paper they say they perform async updates without locks. What does it mean?
what will happen if shared_param.grad is changed by another thread after returning from ensure_shared_grads?
Will the gradient of the current thread be discarded?
yeah it really there just to make sure initializing goes smoothly.
async updates without locks mean they updates from the parallel processes are done asynchronously which means they are run on their own clock and the different process threads can actually make updates at the same time which can lead to bad updates. The reason they do this even though there is risk of bad updates being made is that updates are done much faster which speeds up training as the process of acquiring and releasing locks to do synchronous updates very noticeably slows down the update speed. The positives of much more frequent updates hopefully outweigh the possible negatives of bad updates made and overall speed up training of model