PyTorch vs Tensorflow gives different results

Hi all, I am trying to reimplement Arthur Juliani’s Simple Reinforcement Learning with Tensorflow Part 0: Q-Learning with Tables and Neural Networks tutorial with PyTorch. My code is here.

I apologize in advance for not being able to provide more details, but basically, I am stuck, and I don’t know what I am doing wrong. I have checked code line-by-line and it appears that I have all the pytorch equivalents of Arthur’s TensorFlow code, yet my model doesn’t learn.

My suspicion is that I might have misunderstood autograd and dynamic graph building, but not sure how and when. I hope you would be able to point out my mistakes.

I was able to get it to sometimes produce results similar to TensorFlow here, however, while TF consistently produces result around 0.50%, PyTorch varies from 0.5 to 0.003.

What am I doing wrong?

Sorry your links don’t seem to work so can’t see what your doing

working link, I suppose? : https://github.com/pavelromashkin/rl_pytorch/blob/master/Q-Net.ipynb

1 Like

What i would do if it was me, would be to go throug hthe network, and print out the various intermediate values. make sure to seed both the tensorflow and torch weights etc with identical values (eg using manual_seed() and similar, or just use some deterministic way to initialize them, or initialize them from a file, or something, that is repeatable, and identical across both tensorflow and torch. and then just compare the intermediates between tensorflow and torch. I think this will be more reliable than trying to compare the code line by line, not being clear which lines to pay most attention to.

I’m not really clear on what this means:

“As you can see below, we will need to query our network for Q parameters before we update the weights. In PyTorch every call to a Variable adds it to computational graph and will consider those for backward calcaulation which will affect your gradients.”

Backpropagation only happensi n the backwars diretion. So, I’m not really sure how adding additional forwards calcualtios in your graph will cahnge anything? For example, let’s say you have:

in = autograd.Variable(torch.rand(1,3), requires_grad=True)
out = in * 3

So, you want to back-propagatte through out, to get in.grad. So you do eg:

out.backward(torch.rand(1,3))

(just using some dummy random nubmes for the grad output.

But before doing this you want to do:

a = in * 15 + out

As far as I can tell this is completely benign, wont change anything. What makes you feel this wont be the case? Can you come up with some very simple 3-5 line test case to demonstrate the issue you fear you might be seeing?

if link above to notebook for pytorch is right one. Looks like you are calling backwards on your loss function but with no Variable in it to compute gradients effectively making it useless. So your results are probably are probably coming up just random as the model is not learning anything and just using its random search policy. Without a tensor with Variable wrapper you can not auto compute your gradients with backward() and as only Variable I see code is set to require_grad=False nothing is being computed.

Ops, sorry got the code moved, but yes, essentially that is the correct notebook. I am trying to implement your recommendations, and I think this confused me even more.

What I am trying to do, is to calculate the gradient of loss w.r.t w1 and update it. My y values are the expected Q_target and my y_pred is just Q from the prior step.

Looking at the other tutorials, they don’t provide anything in backward, but the backward() works on all Variables with requires_grad=True.

Based on that, I thought as long as w1 present in the computations its gradients will be computed. Is that not the case?

Edit: also, as I am trying multiple iterations on the code here it seems to be producing a good result, but why?

I have done more tests, and it seems like the original model is actually suffering from this too.

Just to summarize:

After doing some experiments and reading tutorials, I am still not sure what other approaches can be taken in regards to .backward function.

My current understanding of how the dynamic graph works in PyTorch is the following: when you create a Variable with requires_grad=True and perform some actions on it, once you reach some scalar Variable, for example, loss function, and call loss.backward, all requires_grad=True Variables will have their derivatives with respect to loss in .grad param. Is this correct?

Maybe someone can try to reimplement his tutorial showing the proper usage of PyTorch? I, personally, would greatly appreciate it.