I am trying to implement Actor-Critic Algorithm with eligibility trace. As mentioned in algorithm I need to initialize trace vector with same number of network parameters to zero and then update manually. And at the end I need to update both the network Actor as well Critic network parameters manually without using optimizer.step() function. Is it possible?
Yes of course.
You can just create all these Tensors and then instead of doing
with torch.no_grad(): params.copy_(new_params)
torch.no_grad() is important to make sure that this is not recorded in the computational graph !
Hi , Thanks for reply … can you elobarate more in detail with small example
Instead of doing the classical:
pred = model(inp) loss = critetion(pred, ground_truth) optimizer.zero_grad() loss.backward() optimizer.step()
You can compute the updates by hand and then set them into the weights.
This is basically what SGD does where the update_function return
p + lr*p.grad.
pred = model(inp) loss = your_loss(pred) model.zero_grad() loss.backward() with torch.no_grad(): for p in model.parameters(): new_val = update_function(p, p.grad, loss, other_params) p.copy_(new_val)
Can you tell me here have you define update_function(p,p.grad,loss,other_params) manually or its already there in pytorch documentation as I couldnot find and hwo to do with Adam same thing if i want to do?
You have to implement it yourself so that it computes the new value that you want. In your case I guess it should compute what is in the paper you cite.
ok thanks.Stil I have confusion how can I define like and cross verify whether its true or not? if you can help in that
Sorry I don’t understand that question “how can I define like and cross verify whether its true or not?” could you explain please?
Like in my case I define update_function and then update the parameters so whether its true updated or not how can I check. Like f=x**2 ,I know gradient is 2x I can verify manually like that.
In my case above there are two neural network and I have t update both the neural network parametr manually with one function so implemenatation wise I am clue less how can I achieve wrt to above algorithm
What do you mean by “its true updated” do you mean the correct gradients for your function? Well they most certainly are not (in the sense that your fonction don’t have proper gradients) otherwise you could just use the autograd to compute them, you would not have to compute them by hand
I am not familiar with the algorithm you shared above but if you have two models you can loop through model1 parameters then model2 parameters. But that depends on what the algorithm is doing, you’ll need to look into that.
I can’t for your problem as I don’t know it.
If you were doing SGD-like updates, it would be:
def update_function(param, grad, loss, learning_rate): return param - learning_rate * grad
Thanks no problem …for this much help
More precisely, I am talking about this snippet of code:
with torch.no_grad(): weights -= weights.grad * lr bias -= bias.grad * lr weights.grad.zero_() bias.grad.zero_()
Whenever I try to run something like this, the
requires_grad property of
bias is automatically set to
False, and I suspect this is because
bias are being re-assigned inside a
torch.no_grad() context. However, if I use
copy_ inside a
torch.no_grad() context as you mentioned above, it works fine. I think this is because values are changed in-place. Could you please advise?
In the nn tutorial, they are not re-assigned,
-= is actually changing it inplace.
This is why in this sample I use
.copy_() to be able to modify it inplace with a set value.
So operations like
-= modify a tensor in-place? I thought
a -= b is short for
a = a - b like in normal Python?
No actually, in normal python, the two are very different.
The first one calls
__iadd__ on a and that’s it.
The second calls
__add__ and assign the result back to the variable named a.
So pytorch respects this convention of th first one changing the Tensor inplace while the second does not.
Wow, did not know that. Thanks for the info!
for p in meta_i3d.parameters(): new_val = p - 0.0001 * p.grad p.copy_(new_val)
I am doing this to update parameter values but getting an error i.e. “RuntimeError: a leaf Variable that requires grad is being used in an in-place operation.”
Is there any workaround for this? i am working on meta-learning and need to update the weights without calling optimizer.step()
You just need to wrap the call to
.copy_ into a
with torch.no_grad() block to properly disable the autograd during the weight update.
btw why using the inplace copy is needed?