I have doubt in deep q learning

gokul_adethya · March 27, 2020, 1:36pm

I have a doubt DQN say for nth state i am getting

qnthvalues = [1,2,3]

So here max q value is selected which 2 pos or the 3 rd value and i am doing the action 3 and getting qn+1thvalue and now should i apply bellman eq for that action or the 3rd value of qn+1th value and leave other value the same for target value

qn+1value = [2,3,4]
targetq_values = [2,3,bellmaneq(4)]
               Or
targetq_values = bellmaneq(qn+1values)
#for all q values of that state

(So for all q values we will be applying or will be applying for the action q value alone.

iffiX · June 4, 2020, 5:22pm

target_value = reward (from env) + discount * (1 - terminal) * next_value

you can acquire next_value by using the q net itself(vanilla DQN), a target network(fixed target), or select an action using the online network and use this action to find the q value(double DQN)