Gradients for DQL

Hi, I am trying to make a simple AI to learn to walk down halls. What is to its left right above and below are its inputs. and the output actions are move left right up or down. The reward System is as follows.

  • -1 if tries to walk into a wall
  • -1 if walks over a space already walked over
  • +1if walks over a space not yet walked over

Hallway Walk

I am trying to use the following loss function based of a TRPO algorithm to train the neural network:

image

I am able to calculate the loss number. However I am not able to run loss.backwards with out the following error:

element 0 of tensors does not require grad and does not have a grad_fn

Below is the code I am using:

    model.opt.zero_grad()
    qEval = model.forward(stateBatch)  # Gets Gradients
    qEval = torch.stack([torch.tensor([a.max()]) for a in qEval])

    qValsNext = model.forward(nextStateBatch)
    qValsNext = torch.stack([torch.tensor([a.max()]) for a in qValsNext])

    loss = (rewardBatch + qValsNext - qEval)**2
    loss.backward()
    model.opt.step()

Thank you for the help!