Train Neural network problem in reinforcement learning

self.value_optimizer.zero_grad()
preValue = self.value_net(state).data
nextValue = self.value_net(next_state).data
expectedValue = (self.gamma * nextValue) + reward

preValue = Variable(preValue)
expectedValue = Variable(expectedValue)
loss = F.smooth_l1_loss(preValue, expectedValue)
loss.backward()
self.value_optimizer.step()

**self._execution_engine.run_backward((self,), (gradient,), retain_variables)**
**RuntimeError: there are no graph nodes that require computing gradients**

When I run on these codes, the problem occur in loss.backward(), so what can I do to solve it?

I think you want to do Variable(..., requires_grad=True) where appropriate

self.value_optimizer.zero_grad()
# Here, when you unpack the data, you detach the data from the graph
# No backpropagation through the model is possible, because you got rid
# of the reference to the grpah.
preValue = self.value_net(state).data
nextValue = self.value_net(next_state).data
expectedValue = (self.gamma * nextValue) + reward

# Here, you repack the tensors in Variables, but the history of
# operations is not retained - they are leaf Variables.
# Also you didn't specify that they require gradients (they don't
# by default).
preValue = Variable(preValue)
expectedValue = Variable(expectedValue)
loss = F.smooth_l1_loss(preValue, expectedValue)
# At this point your while graph looks like this - no model there:
#  preValue        expectedValue
#         \          /
#        smooth_f1_loss
#              |
#            loss
loss.backward()
self.value_optimizer.step()

If I understand correctly, it seems that you want to do Q-learning. You might want to take a look at our DQN tutorial.

2 Likes

I have tried it, but it seems not work.

When I try to use only
preValue = self.value_net(state) # here without .data
preValue = Variable(preValue) # get rid of this line.
it works. Maybe it is the mechanism of the Variable and tensor is not matched for .backward()

Try this:

self.value_optimizer.zero_grad()
preValue = self.value_net(state)
nextValue = self.value_net(next_state).detach() # don't backprop this way
expectedValue = (self.gamma * nextValue) + reward
loss = F.smooth_l1_loss(preValue, expectedValue)
loss.backward()
self.value_optimizer.step()

I have tried it, but this problem occurs.

**expectedValue = (self.gamma * nextValue) + reward**

** File “/home/tommy/anaconda2/lib/python2.7/site-packages/torch/autograd/variable.py”, line 676, in add**
** return self.add(other)**
** File “/home/tommy/anaconda2/lib/python2.7/site-packages/torch/autograd/variable.py”, line 286, in add**
** return self._add(other, False)**
** File “/home/tommy/anaconda2/lib/python2.7/site-packages/torch/autograd/variable.py”, line 282, in _add**
** assert not torch.is_tensor(other)**
AssertionError

reward is a Tensor, but it should be a Variable.