Problem on Variable.grad.data?

D_Kay · March 8, 2017, 5:32pm

Hi all ,
I actually installed the lastest version of PyTorch on a new computer (0.1.10) and noticed that the grad seems to be a bit faulty :
x=torch.Tensor(5,5).normal_() x=Variable(x,requires_grad=True) print(x.grad.data) AttributeError: 'NoneType' object has no attribute 'data'

print(x.grad)
None

dir(x.grad)
['__class__', '__delattr__', '__doc__', '__format__', '__getattribute__', '__hash__', '__init__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__']

Whereas is it working fine on my personal computer , on PyTorch ‘0.1.9+aaf41c6’. Is this an issue or there is something that i Don’t know about?

Thanks a lot !

albanD · March 8, 2017, 5:39pm

Hi,

Since a very recent version, the gradient buffers are initialised only when needed.
If you actually backprop anything, it will work properly:

x=torch.Tensor(5,5).normal_()
x=Variable(x,requires_grad=True)
print(hasattr(x.grad, "data")) # prints False
x.add(1).sum().backward()
print(hasattr(x.grad, "data")) # prints True

D_Kay · March 8, 2017, 10:17pm

Hi!
Thanks for the reply!

Is there a way to easily init the gradient buffer ?

I also wondered where I can get the latest patch notes? The last time I checked on github, there were only the notes for the 0.1.9 version

Thanks again !

albanD · March 9, 2017, 10:18am

Hi,

I don’t know if you can initialise it by hand, @apaszke would have to answer this.
But I am wondering what is the use case where you would like to access a gradient buffer before it contains anything useful?

For the 0.1.10 release note, I think Soumith is currently working on them and they should be on github soon.

D_Kay · March 9, 2017, 10:48am

Hi there,
Thanks for the answer ; I actually needed to access a gradient buffer to code a Variational Inference method ; I wanted to update the gradients of the mu (mean of weights) and sigma (std) by hand.

I managed to get access to it by doing a “dummy” backprop in the _ _init _ _. That’s a quick and dirty solution, and I’m open to better ones.

Thanks again !

apaszke · March 9, 2017, 11:16am

No, you can’t initialize it manually, but I don’t really see why would you need to do that (it’d be a tensor of zeros anyway). You can still access and modify the gradient, but only once the backward has been computed.

Doesn’t this work for you:

output = model(input)
loss(output).backward()
model.weight.grad.data # not None anymore. Can be modified

ypxie · March 13, 2017, 7:04pm

I also encountered this issue during experimenting an a3c reinforcement learning algorithm.
The parameters of shared_model are updated using **gradients calculated by other works **. So shared_model never really do the backward things.

It would be really helpful to be able to manually init them.
Thanks!

apaszke · March 13, 2017, 7:29pm

I see, that’s actually a good point, we haven’t thought about that. We’ll have to solve it somehow

Ilya_Kostrikov · March 14, 2017, 2:12pm

How did you perform the “dummy” backprop?

ypxie · March 14, 2017, 4:04pm

I used the following workaround.

        for shared_param in  shared_model.parameters():
            if not hasattr(shared_param.grad, 'data'):
                dummy_loss = 0
                for this_para in shared_model.parameters():
                    dummy_loss += torch.mean(this_para)
                dummy_loss.backward()
                break

apaszke · March 14, 2017, 6:49pm

So when exactly is it a problem? How are you implementing A3C? Are you sharing the main model parameters or does your training loop body look like this:

loss = fn(input)
loss.backward()
copy_grads_to_shared_model(model, shared_model)
shared_model_optimizer.step()
copy_params_to_local_model(model, shared_model)

apaszke · March 14, 2017, 9:40pm

I’ve sent a PR with a simpler solution to the problem.

D_Kay · March 17, 2017, 3:33pm

Hi,
I did exactly as @ypxie did. However, for a Variational Bayes, I noticed I could do by using a Variational loss, and using autograd 's .backward() method. I had to replace the Layers nn.Linear with Variable tensors, in order for the gradients to pass through the draw of the network parameters and the net’s .forward() method.

ypxie · March 17, 2017, 4:35pm

Thank you~
If using multiple threads
it seems that shared_model_optimizer.step() will not be safe?

apaszke · March 18, 2017, 9:41pm

Depends on what you consider unsafe

ypxie · March 19, 2017, 9:00pm

Thank you~ For the safe issue, i noticed that DM’s paper explicitly said they don’t put a lock on the shared weights.

For your solution to the shared grad to pytorch-a3c.

def ensure_shared_grads(model, shared_model):
    for param, shared_param in zip(model.parameters(), shared_model.parameters()):
        if shared_param.grad is not None:
            return
        shared_param._grad = param.grad

Will this code restrain the shard_model grad only being bounded with one local_model?
Cause share_model.grad will not be None after running this function for once. And other threads of local_model won’t be able to change _grad anymore. Or _grad will not be accesible to other threads?

jingweiz · March 29, 2017, 12:27pm

Hey,
did you figure it out? Seems your code works with this right? Or is it needed to do a global_i.grad.data = local_i.grad.data.clone() still after this?

longhuei · May 26, 2017, 9:51pm

So what’s the verdict here? Should we just remove the if condition? Because I can see no reason for the check. From what I’ve seen _grad is always accessible, and the global parameter grad is always updated to match the local, after I remove the if condition.

jhliew · July 8, 2017, 11:23am

It seems like share_memory does not share the gradients, so shared_model.grad will still be None for other processes even it has been run once on one process.

@apaszke, I wonder if I understand this correctly?

xuehy · July 11, 2017, 9:12am

What is the difference between .grad and ._grad ?