I dont fully understand the eps_decay parameter in the DQN example. It is explained the eps_start represents the chance to select a random action at the beginning of the training and eps_ends is the chance to select a random action at the end of the training. With these two variables, one would assume the decay rate from the eps_start to the eps_end chance would depend on the number of episodes the training takes. So, from 0.9 chance to 0.1 is a difference of 0.8 and you can divide this by the number of episodes that will happen, and you get the decay rate per episode. So, my question is how does eps_decay come into this? I can understand if you want it to fully decay not at the end of the training but let’s say at half of the episodes, you could double the decay rate. However, the given example parameter is for example 200 in the cartpool environment. What does this 200 represent?
TLDR: it’s just a fancy way to decay the epsilon
As you say, if you want the parameter to decay halfway you can change this via this parameter. Have a look at this link where you can change the value to see how the decay changes. If eps_decay = 100, then the eps will be 0.363 at time step 100.