Problem on Variable.grad.data?

Hi all ,
I actually installed the lastest version of PyTorch on a new computer (0.1.10) and noticed that the grad seems to be a bit faulty :
x=torch.Tensor(5,5).normal_() x=Variable(x,requires_grad=True) print(x.grad.data) AttributeError: 'NoneType' object has no attribute 'data'

print(x.grad)
None

dir(x.grad)
['__class__', '__delattr__', '__doc__', '__format__', '__getattribute__', '__hash__', '__init__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__']

Whereas is it working fine on my personal computer , on PyTorch ‘0.1.9+aaf41c6’. Is this an issue or there is something that i Don’t know about?

Thanks a lot !

4 Likes

Hi,

Since a very recent version, the gradient buffers are initialised only when needed.
If you actually backprop anything, it will work properly:

x=torch.Tensor(5,5).normal_()
x=Variable(x,requires_grad=True)
print(hasattr(x.grad, "data")) # prints False
x.add(1).sum().backward()
print(hasattr(x.grad, "data")) # prints True
1 Like

Hi!
Thanks for the reply!

Is there a way to easily init the gradient buffer ?

I also wondered where I can get the latest patch notes? The last time I checked on github, there were only the notes for the 0.1.9 version

Thanks again !

Hi,

I don’t know if you can initialise it by hand, @apaszke would have to answer this.
But I am wondering what is the use case where you would like to access a gradient buffer before it contains anything useful?

For the 0.1.10 release note, I think Soumith is currently working on them and they should be on github soon.

Hi there,
Thanks for the answer ; I actually needed to access a gradient buffer to code a Variational Inference method ; I wanted to update the gradients of the mu (mean of weights) and sigma (std) by hand.

I managed to get access to it by doing a “dummy” backprop in the _ _init _ _. That’s a quick and dirty solution, and I’m open to better ones.

Thanks again !

No, you can’t initialize it manually, but I don’t really see why would you need to do that (it’d be a tensor of zeros anyway). You can still access and modify the gradient, but only once the backward has been computed.

Doesn’t this work for you:

output = model(input)
loss(output).backward()
model.weight.grad.data # not None anymore. Can be modified

I also encountered this issue during experimenting an a3c reinforcement learning algorithm.
The parameters of shared_model are updated using **gradients calculated by other works **. So shared_model never really do the backward things.

It would be really helpful to be able to manually init them.
Thanks!

2 Likes

I see, that’s actually a good point, we haven’t thought about that. We’ll have to solve it somehow

3 Likes

How did you perform the “dummy” backprop?

I used the following workaround.

        for shared_param in  shared_model.parameters():
            if not hasattr(shared_param.grad, 'data'):
                dummy_loss = 0
                for this_para in shared_model.parameters():
                    dummy_loss += torch.mean(this_para)
                dummy_loss.backward()
                break

So when exactly is it a problem? How are you implementing A3C? Are you sharing the main model parameters or does your training loop body look like this:

loss = fn(input)
loss.backward()
copy_grads_to_shared_model(model, shared_model)
shared_model_optimizer.step()
copy_params_to_local_model(model, shared_model)

1 Like

I’ve sent a PR with a simpler solution to the problem.

1 Like

Hi,
I did exactly as @ypxie did. However, for a Variational Bayes, I noticed I could do by using a Variational loss, and using autograd 's .backward() method. I had to replace the Layers nn.Linear with Variable tensors, in order for the gradients to pass through the draw of the network parameters and the net’s .forward() method.

Thank you~
If using multiple threads
it seems that shared_model_optimizer.step() will not be safe?

Depends on what you consider unsafe :slight_smile:

1 Like

Thank you~ For the safe issue, i noticed that DM’s paper explicitly said they don’t put a lock on the shared weights.

For your solution to the shared grad to pytorch-a3c.

def ensure_shared_grads(model, shared_model):
    for param, shared_param in zip(model.parameters(), shared_model.parameters()):
        if shared_param.grad is not None:
            return
        shared_param._grad = param.grad

Will this code restrain the shard_model grad only being bounded with one local_model?
Cause share_model.grad will not be None after running this function for once. And other threads of local_model won’t be able to change _grad anymore. Or _grad will not be accesible to other threads?

2 Likes

Hey,
did you figure it out? Seems your code works with this right? Or is it needed to do a global_i.grad.data = local_i.grad.data.clone() still after this?

1 Like

So what’s the verdict here? Should we just remove the if condition? Because I can see no reason for the check. From what I’ve seen _grad is always accessible, and the global parameter grad is always updated to match the local, after I remove the if condition.

It seems like share_memory does not share the gradients, so shared_model.grad will still be None for other processes even it has been run once on one process.

@apaszke, I wonder if I understand this correctly?

What is the difference between .grad and ._grad ?

1 Like