DQN gives same output for any inputs

spadel · June 23, 2020, 12:52pm

So I have recently already created a topic, but the problem has been reduced to a more simple one.
I have the problem, that my DQN after training always returns the same outputs for whatever input I give. The DQN looks as follows:

class DQN(nn.Module):
    def __init__(self, lr, input_dims, fc1_dims, fc2_dims, n_actions):
        super(DQN, self).__init__()
        self.input_dims = input_dims
        self.fc1_dims = fc1_dims
        self.fc2_dims = fc2_dims
        self.n_actions = n_actions
        self.fc1 = nn.Linear(*self.input_dims, self.fc1_dims)    # in Tutorial steht *self.input_dims
        self.fc2 = nn.Linear(self.fc1_dims, self.fc2_dims)
        self.out = nn.Linear(self.fc2_dims, self.n_actions)
        self.optimizer = optim.Adam(self.parameters(), lr=lr)
        self.loss = nn.MSELoss()
        self.device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
        self.to(self.device)

    def forward(self, observation):
        state = torch.Tensor(observation).to(self.device)
        x = F.relu(self.fc1(state))
        x = F.relu(self.fc2(x))
        actions = self.out(x)

        return actions

After training, if I type something like:

print(trained_brain.Q_eval.forward([1, 0, 0, -1, 0, 0, 0, 0, 0]))
print(trained_brain.Q_eval.forward([1, 1, 0, -1, 0, -1, 0, 0, 0]))

I get the results:

tensor([0.2661, 0.1787, 0.1648, 0.1722, 0.2262, 0.1747, 0.2077, 0.1326, 0.2635],
       grad_fn=<AddBackward0>)
tensor([0.2661, 0.1787, 0.1648, 0.1722, 0.2262, 0.1747, 0.2077, 0.1326, 0.2635],
       grad_fn=<AddBackward0>)

despite the fact, the the inputs are different. I assume, that there might be a problem with the datatypes, but I cannot figure where.