So I have recently already created a topic, but the problem has been reduced to a more simple one.
I have the problem, that my DQN after training always returns the same outputs for whatever input I give. The DQN looks as follows:
class DQN(nn.Module):
def __init__(self, lr, input_dims, fc1_dims, fc2_dims, n_actions):
super(DQN, self).__init__()
self.input_dims = input_dims
self.fc1_dims = fc1_dims
self.fc2_dims = fc2_dims
self.n_actions = n_actions
self.fc1 = nn.Linear(*self.input_dims, self.fc1_dims) # in Tutorial steht *self.input_dims
self.fc2 = nn.Linear(self.fc1_dims, self.fc2_dims)
self.out = nn.Linear(self.fc2_dims, self.n_actions)
self.optimizer = optim.Adam(self.parameters(), lr=lr)
self.loss = nn.MSELoss()
self.device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
self.to(self.device)
def forward(self, observation):
state = torch.Tensor(observation).to(self.device)
x = F.relu(self.fc1(state))
x = F.relu(self.fc2(x))
actions = self.out(x)
return actions
After training, if I type something like:
print(trained_brain.Q_eval.forward([1, 0, 0, -1, 0, 0, 0, 0, 0]))
print(trained_brain.Q_eval.forward([1, 1, 0, -1, 0, -1, 0, 0, 0]))
I get the results:
tensor([0.2661, 0.1787, 0.1648, 0.1722, 0.2262, 0.1747, 0.2077, 0.1326, 0.2635],
grad_fn=<AddBackward0>)
tensor([0.2661, 0.1787, 0.1648, 0.1722, 0.2262, 0.1747, 0.2077, 0.1326, 0.2635],
grad_fn=<AddBackward0>)
despite the fact, the the inputs are different. I assume, that there might be a problem with the datatypes, but I cannot figure where.