Hello I am new here
my name is Chris I study CS.
I am implementing the DQN algorithm in Pytorch
and I made an simple example.
It has 2 States and 4 actions.
For state 1 the action 1 gets a reward of plus 5 and state 2 the action 3 gets -5
after training for a wile the q function for bothstate are almost equal.
I think for the state 1 the q value for action1 should be close to 5
and for state 2 the q value action 3 should be close to -5
Does anyone now what could I do wrong
Thank you for your help