self.value_optimizer.zero_grad()
# Here, when you unpack the data, you detach the data from the graph
# No backpropagation through the model is possible, because you got rid
# of the reference to the grpah.
preValue = self.value_net(state).data
nextValue = self.value_net(next_state).data
expectedValue = (self.gamma * nextValue) + reward
# Here, you repack the tensors in Variables, but the history of
# operations is not retained - they are leaf Variables.
# Also you didn't specify that they require gradients (they don't
# by default).
preValue = Variable(preValue)
expectedValue = Variable(expectedValue)
loss = F.smooth_l1_loss(preValue, expectedValue)
# At this point your while graph looks like this - no model there:
# preValue expectedValue
# \ /
# smooth_f1_loss
# |
# loss
loss.backward()
self.value_optimizer.step()
If I understand correctly, it seems that you want to do Q-learning. You might want to take a look at our DQN tutorial.
When I try to use only
preValue = self.value_net(state) # here without .data
preValue = Variable(preValue) # get rid of this line.
it works. Maybe it is the mechanism of the Variable and tensor is not matched for .backward()
** File “/home/tommy/anaconda2/lib/python2.7/site-packages/torch/autograd/variable.py”, line 676, in add**
** return self.add(other)**
** File “/home/tommy/anaconda2/lib/python2.7/site-packages/torch/autograd/variable.py”, line 286, in add**
** return self._add(other, False)**
** File “/home/tommy/anaconda2/lib/python2.7/site-packages/torch/autograd/variable.py”, line 282, in _add**
** assert not torch.is_tensor(other)** AssertionError