Pytorch DQN tutorial - where is autograd?

Ranahanocka · August 19, 2018, 9:03pm

https://pytorch.org/tutorials/intermediate/reinforcement_q_learning.html

while the comments in the tutorial specify that autograd is used, it is never explicitly declared (that I can see). In supervised learning, the inputs are usually set as input_data = Variable(input_data) and then out = net.forward(data). However, here, Variable is never used. I do see that the loss tensor contains a gradient - but I am not sure where this came from.

Another observation, if I set
state_action_values = Variable(state_action_values,requires_grad=True)
then the code will not run - throwing an error on:
for param in policy_net.parameters():
param.grad.data.clamp_(-1, 1)

saying that ‘NoneType’ has no attribute data (where as clearly before adding the Variable code it did…)

Any ideas? Why is Variable not necessary here?

ptrblck · August 19, 2018, 9:50pm

Which PyTorch version are you using?
In 0.4.0 Variables and tensors were merged, so that you don’t have to wrap your tensors anymore.
If you are still used to Variables, the migration guide might help.
Also, the current release is 0.4.1. Make sure to update to this version.

Ranahanocka · August 19, 2018, 10:04pm

Wow, thanks for the reply @ptrblck, I didn’t realize that change. I’m going to check into this. I think I’m still on 0.3.1 since my code wasn’t forward compatible last time I tried updating

ptrblck · August 19, 2018, 10:06pm

You can find the install instructions on the website.
If you encounter any problems updating PyTorch or your code, just let us know and we can try to help you out.

Ranahanocka · August 19, 2018, 11:09pm

So this migration guide says tensor history will only be tracked if requires_grad=True , but this is not in the DQN tutorial. What am I missing?

Also, I checked my version on this machine I actually compiled from source, so it looks like I have the newest release.

ptrblck · August 19, 2018, 11:32pm

I haven’t explored the tutorial in detail, but from what I know state_action_values are the output of the model, and should already require gradients.
Could you check it with state_action_values.requires_grad?

Also, if you re-wrap a Tensor, it will lose it’s associated computation graph and you are thus detaching it.
That’s the reason, why .grad is empty in the example you’ve posted.

Ranahanocka · August 19, 2018, 11:49pm

Yes,state_action_values are indeed the output of the policy_net, which has the input state_batch. I checked and state_action_values.requires_grad=True even though it was never explicitly written in code. (I guess this is by default when passing tensors through models, unless with torch.no_grad() - right?)

ptrblck · August 20, 2018, 8:43am

Yes, as the model will have some parameters requiring gradients, this property will be passed on:

a = torch.randn(1)
b = torch.randn(1)
c = torch.randn(1, requires_grad=True)

d = a * b
d.requires_grad

e = a * c
e.requires_grad

Ranahanocka · August 20, 2018, 3:28pm

Yes, your example is clear. I see, so because the module Parameters inside the network are automatically set to requires_grad, then everything that goes through it gets a grad.

I just modified the DQN to load in a pre-trained network. It looks like it was trained versions ago (and used Variable). Now - I run into the same problem as before

I guess this is because of the fact that the stored model dictionary had a Variable in it right (their module defined Variable within the network itself)? This means that I can’t use older pre-trained networks on torch 0.4+?

Ranahanocka · August 20, 2018, 6:04pm

OK, I performed network surgery, redefined the network and only loaded the state_dict from the modules that existed (removing the method that had a variable function). Still no dice. I added requires_grad=True to the tensors, but still there are parameters in the network without a gradient. Not sure what is going on, but it feels like there might be something wrong. (note that everything is OK when I am training from scratch, I can load models back in, etc).

ptrblck · August 20, 2018, 7:16pm

As far as I understand, the problem occurs if you want to use the now model code with an old state_dict?
Could you create a small executable code snippet so that I can have a look at it?

Ranahanocka · August 21, 2018, 2:31am

Yep, that did the trick
Thanks!