Q network works only on flipping a term to the other side of the equation

I am using the openai gym cartpole-v0 environment. Here the code that doesn’t work. (optimizer.zero_grad() and optimizer.step() are performed outside the function)

def make_step(model, optimizer, criterion, observation, action, reward, next_observation):
    inp = torch.from_numpy(observation)
    target = model(torch.from_numpy(observation)).detach().numpy()

    next_target = model(torch.from_numpy(next_observation)).detach().numpy()

    new_reward = np.max(next_target)

    target[action] = reward
    target[action] += new_reward

    obv_reward = model(inp.double())
    target_reward = torch.from_numpy(target)

    loss = criterion(obv_reward, target_reward)
    loss.backward()

On running the code, the agent learns nothing and achieves no more than 10 reward.
Now if I flip the gamma term to the left and remove the network’s foresight, it does slightly better, achieving around 30-120 reward.

def make_step(model, optimizer, criterion, observation, action, reward, next_observation):
    inp = torch.from_numpy(observation)
    target = model(torch.from_numpy(observation)).detach().numpy()

    #next_target = model(torch.from_numpy(next_observation)).detach().numpy()

    #new_reward = np.max(next_target)

    target[action] = reward
    #target[action] += new_reward

    obv_reward = model(inp.double()) - model(torch.from_numpy(next_observation))
    target_reward = torch.from_numpy(target)

    loss = criterion(obv_reward, target_reward)
    loss.backward()

Why is the first one not working and how do I fix it?