In the most popular a3c pytorch implementation, there’s a function
ensure_shared_grads that ensure the local and global shared optimizer share gradient.
In line 18,
shared_param._grad = param.grad is used to share the gradient of the local optimizer with the global optimizer. However, only
shared_param.grad is checked to ensure the gradients are shared in subsequently backward pass.
What’s the difference between