Input features in reward function

NHarris · April 24, 2023, 8:25pm

Can the values used as an environments input also be used to calculate the reward or would that somehow lessen the agents performance or “learning”?

J_Johnson · April 25, 2023, 1:20am

That depends on how directly correlated those values are to the desired outcome. The more correlated, the more beneficial.

For example, in Lunar Lander, the euclidean distance from the landing spot can be directly calculated from the x, y coordinates. And so part of the desired solution is reducing that distance.

However, if you reduce it too quickly, the velocity will be too high and it will crash. So the other part of the solution is maintaining a safe velocity and decelerating when approaching the landing zone.

So one could construct a reward function with some combination of the above points.

However, if you simply give the inverse of the euclidean distance as a reward(i.e. as the distance increases, the reward decreases and v.v.), you will have the issue that the reward may be too high at very close distances. So you either need to clip the reward OR make it 0 when the delta of the distance from the previous frame is unchanged/increased(i.e. the rocket didn’t get closer) and 1 when that distance decreased(the rocket got closer).