Q-learning. Can you tell me if the output is correct?

MACHOCAPTCHA · March 12, 2023, 6:06pm

I investigate the error of Q-learning by a function:
‘MSELoss( q_value, target_q_value)’
‘q_target_value’ I calculate as a copy of ‘q_value’ with a change in the cell with the maximum value (cell = action):
‘target_q_value[action] = (reward + self.gamma * max_next_q_value)’
After I send the value to ‘MSELoss’
further as standard.
'self.optimizer.zero_grad()
los.backward()
self.optimizer.step()’
If for example I have an output of six cells:
[5.0, 3.2, 15.2, 9.7, 2.3, 7.7]
Here I have to change the tensor with cell number 2 (15.2)
and I get, for example, such a tensor for ‘target_q_value’:
[5.0, 3.2, 14.2, 9.7, 2.3, 7.7].
Now, after ‘MSELoss and backward and optimizer’ should I get a downgrade in one cell ‘q_value’? Or in everyone?
I get a change in all the cells, and they all fall at the same time to a certain point, and after that they don’t move. It seems to me that something is wrong here.