Hi!
Thanks for the reply!
Is there a way to easily init the gradient buffer ?
I also wondered where I can get the latest patch notes? The last time I checked on github, there were only the notes for the 0.1.9 version
Thanks again !
Hi!
Thanks for the reply!
Is there a way to easily init the gradient buffer ?
I also wondered where I can get the latest patch notes? The last time I checked on github, there were only the notes for the 0.1.9 version
Thanks again !
Hi,
I don’t know if you can initialise it by hand, @apaszke would have to answer this.
But I am wondering what is the use case where you would like to access a gradient buffer before it contains anything useful?
For the 0.1.10 release note, I think Soumith is currently working on them and they should be on github soon.
Hi there,
Thanks for the answer ; I actually needed to access a gradient buffer to code a Variational Inference method ; I wanted to update the gradients of the mu (mean of weights) and sigma (std) by hand.
I managed to get access to it by doing a “dummy” backprop in the _ _init _ _. That’s a quick and dirty solution, and I’m open to better ones.
Thanks again !
No, you can’t initialize it manually, but I don’t really see why would you need to do that (it’d be a tensor of zeros anyway). You can still access and modify the gradient, but only once the backward has been computed.
Doesn’t this work for you:
output = model(input)
loss(output).backward()
model.weight.grad.data # not None anymore. Can be modified
I also encountered this issue during experimenting an a3c reinforcement learning algorithm.
The parameters of shared_model are updated using **gradients calculated by other works **. So shared_model never really do the backward things.
It would be really helpful to be able to manually init them.
Thanks!
I see, that’s actually a good point, we haven’t thought about that. We’ll have to solve it somehow
How did you perform the “dummy” backprop?
I used the following workaround.
for shared_param in shared_model.parameters():
if not hasattr(shared_param.grad, 'data'):
dummy_loss = 0
for this_para in shared_model.parameters():
dummy_loss += torch.mean(this_para)
dummy_loss.backward()
break
So when exactly is it a problem? How are you implementing A3C? Are you sharing the main model parameters or does your training loop body look like this:
loss = fn(input)
loss.backward()
copy_grads_to_shared_model(model, shared_model)
shared_model_optimizer.step()
copy_params_to_local_model(model, shared_model)
Hi,
I did exactly as @ypxie did. However, for a Variational Bayes, I noticed I could do by using a Variational loss, and using autograd 's .backward() method. I had to replace the Layers nn.Linear with Variable tensors, in order for the gradients to pass through the draw of the network parameters and the net’s .forward() method.
Thank you~
If using multiple threads
it seems that shared_model_optimizer.step()
will not be safe?
Depends on what you consider unsafe
Thank you~ For the safe issue, i noticed that DM’s paper explicitly said they don’t put a lock on the shared weights.
For your solution to the shared grad to pytorch-a3c.
def ensure_shared_grads(model, shared_model):
for param, shared_param in zip(model.parameters(), shared_model.parameters()):
if shared_param.grad is not None:
return
shared_param._grad = param.grad
Will this code restrain the shard_model grad only being bounded with one local_model?
Cause share_model.grad will not be None after running this function for once. And other threads of local_model won’t be able to change _grad anymore. Or _grad will not be accesible to other threads?
Hey,
did you figure it out? Seems your code works with this right? Or is it needed to do a global_i.grad.data = local_i.grad.data.clone()
still after this?
So what’s the verdict here? Should we just remove the if condition? Because I can see no reason for the check. From what I’ve seen _grad is always accessible, and the global parameter grad is always updated to match the local, after I remove the if condition.
It seems like share_memory
does not share the gradients, so shared_model.grad will still be None for other processes even it has been run once on one process.
@apaszke, I wonder if I understand this correctly?
What is the difference between .grad
and ._grad
?
@apaszke @Soumith_Chintala do you know how to fix this? https://github.com/pytorch/pytorch/issues/5650
Please avoid posting issues at multiple places and tagging people like that. Wee look at all the issues, this just creates more noise.