DQN Tutorial Issue

jcava · July 30, 2017, 1:30am

I am currently trying to replicate in my own way the DQN tutorial; however for some reason as the DQN runs, the loss doesn’t change, and as such I believe the DQN isn’t being optimized. From looking at other similar questions, I think it has to deal with how I constructed the computational graph, but I don’t know exactly how to fix it from this particular situation.

def optimize_model():
    batch = random.sample(D, batch_size)
    state_batch = []
    action_batch = []
    reward_batch = []
    next_state_batch = []
    for state, action, reward, next_state in batch:
        state_batch.append(state)
        action_batch.append(action)
        reward_batch.append(reward)
        next_state_batch.append(next_state)
    q_values = model(Variable(FloatTensor(state_batch)))
    if len(D) < 128:
        q_values = q_values.gather(1, Variable(torch.LongTensor([action_batch])))
    else:
        q_values = q_values.gather(1, Variable(torch.t(torch.LongTensor([action_batch]))))
    next_state_values = model(Variable(FloatTensor(next_state_batch), volatile=True)).max(1)[0]
    next_state_values.volatile = False
    expected_state_action_values = (next_state_values * gamma) + Variable(FloatTensor(reward_batch))
    loss = torch.nn.functional.smooth_l1_loss(q_values, expected_state_action_values)
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()