Hi all ,
I actually installed the lastest version of PyTorch on a new computer (0.1.10) and noticed that the grad seems to be a bit faulty : x=torch.Tensor(5,5).normal_() x=Variable(x,requires_grad=True) print(x.grad.data) AttributeError: 'NoneType' object has no attribute 'data'
I donât know if you can initialise it by hand, @apaszke would have to answer this.
But I am wondering what is the use case where you would like to access a gradient buffer before it contains anything useful?
For the 0.1.10 release note, I think Soumith is currently working on them and they should be on github soon.
Hi there,
Thanks for the answer ; I actually needed to access a gradient buffer to code a Variational Inference method ; I wanted to update the gradients of the mu (mean of weights) and sigma (std) by hand.
I managed to get access to it by doing a âdummyâ backprop in the _ _init _ _. Thatâs a quick and dirty solution, and Iâm open to better ones.
No, you canât initialize it manually, but I donât really see why would you need to do that (itâd be a tensor of zeros anyway). You can still access and modify the gradient, but only once the backward has been computed.
Doesnât this work for you:
output = model(input)
loss(output).backward()
model.weight.grad.data # not None anymore. Can be modified
I also encountered this issue during experimenting an a3c reinforcement learning algorithm.
The parameters of shared_model are updated using **gradients calculated by other works **. So shared_model never really do the backward things.
It would be really helpful to be able to manually init them.
Thanks!
for shared_param in shared_model.parameters():
if not hasattr(shared_param.grad, 'data'):
dummy_loss = 0
for this_para in shared_model.parameters():
dummy_loss += torch.mean(this_para)
dummy_loss.backward()
break
So when exactly is it a problem? How are you implementing A3C? Are you sharing the main model parameters or does your training loop body look like this:
loss = fn(input)
loss.backward()
copy_grads_to_shared_model(model, shared_model)
shared_model_optimizer.step()
copy_params_to_local_model(model, shared_model)
Hi,
I did exactly as @ypxie did. However, for a Variational Bayes, I noticed I could do by using a Variational loss, and using autograd 's .backward() method. I had to replace the Layers nn.Linear with Variable tensors, in order for the gradients to pass through the draw of the network parameters and the netâs .forward() method.
Thank you~ For the safe issue, i noticed that DMâs paper explicitly said they donât put a lock on the shared weights.
For your solution to the shared grad to pytorch-a3c.
def ensure_shared_grads(model, shared_model):
for param, shared_param in zip(model.parameters(), shared_model.parameters()):
if shared_param.grad is not None:
return
shared_param._grad = param.grad
Will this code restrain the shard_model grad only being bounded with one local_model?
Cause share_model.grad will not be None after running this function for once. And other threads of local_model wonât be able to change _grad anymore. Or _grad will not be accesible to other threads?
Hey,
did you figure it out? Seems your code works with this right? Or is it needed to do a global_i.grad.data = local_i.grad.data.clone() still after this?
So whatâs the verdict here? Should we just remove the if condition? Because I can see no reason for the check. From what Iâve seen _grad is always accessible, and the global parameter grad is always updated to match the local, after I remove the if condition.
It seems like share_memory does not share the gradients, so shared_model.grad will still be None for other processes even it has been run once on one process.
@apaszke, I wonder if I understand this correctly?