Is there a way to easily init the gradient buffer ?

I also wondered where I can get the latest patch notes? The last time I checked on github, there were only the notes for the 0.1.9 version

I don’t know if you can initialise it by hand, @apaszke would have to answer this.
But I am wondering what is the use case where you would like to access a gradient buffer before it contains anything useful?

For the 0.1.10 release note, I think Soumith is currently working on them and they should be on github soon.

Thanks for the answer ; I actually needed to access a gradient buffer to code a Variational Inference method ; I wanted to update the gradients of the mu (mean of weights) and sigma (std) by hand.

I managed to get access to it by doing a “dummy” backprop in the _ _init _ _. That’s a quick and dirty solution, and I’m open to better ones.

No, you can’t initialize it manually, but I don’t really see why would you need to do that (it’d be a tensor of zeros anyway). You can still access and modify the gradient, but only once the backward has been computed.

Doesn’t this work for you:

output = model(input)
loss(output).backward() # not None anymore. Can be modified

I also encountered this issue during experimenting an a3c reinforcement learning algorithm.
The parameters of shared_model are updated using **gradients calculated by other works **. So shared_model never really do the backward things.

It would be really helpful to be able to manually init them.


I see, that’s actually a good point, we haven’t thought about that. We’ll have to solve it somehow


How did you perform the “dummy” backprop?

I used the following workaround.

        for shared_param in  shared_model.parameters():
            if not hasattr(shared_param.grad, 'data'):
                dummy_loss = 0
                for this_para in shared_model.parameters():
                    dummy_loss += torch.mean(this_para)

So when exactly is it a problem? How are you implementing A3C? Are you sharing the main model parameters or does your training loop body look like this:

loss = fn(input)
copy_grads_to_shared_model(model, shared_model)
copy_params_to_local_model(model, shared_model)

I’ve sent a PR with a simpler solution to the problem.

I did exactly as @ypxie did. However, for a Variational Bayes, I noticed I could do by using a Variational loss, and using autograd 's .backward() method. I had to replace the Layers nn.Linear with Variable tensors, in order for the gradients to pass through the draw of the network parameters and the net’s .forward() method.

If using multiple threads
it seems that shared_model_optimizer.step() will not be safe?

Depends on what you consider unsafe :slight_smile:

Thank you~ For the safe issue, i noticed that DM’s paper explicitly said they don’t put a lock on the shared weights.

For your solution to the shared grad to pytorch-a3c.

def ensure_shared_grads(model, shared_model):
    for param, shared_param in zip(model.parameters(), shared_model.parameters()):
        if shared_param.grad is not None:
        shared_param._grad = param.grad

Will this code restrain the shard_model grad only being bounded with one local_model?
Cause share_model.grad will not be None after running this function for once. And other threads of local_model won’t be able to change _grad anymore. Or _grad will not be accesible to other threads?


did you figure it out? Seems your code works with this right? Or is it needed to do a = still after this?

So what’s the verdict here? Should we just remove the if condition? Because I can see no reason for the check. From what I’ve seen _grad is always accessible, and the global parameter grad is always updated to match the local, after I remove the if condition.

It seems like share_memory does not share the gradients, so shared_model.grad will still be None for other processes even it has been run once on one process.

@apaszke, I wonder if I understand this correctly?

What is the difference between .grad and ._grad ?

@apaszke @Soumith_Chintala do you know how to fix this?

Please avoid posting issues at multiple places and tagging people like that. Wee look at all the issues, this just creates more noise.