I am working on a hexapod that learns how to walk through RL. I have pretty much all of the code written, just working on training it. The issue is that I don’t feel like my reward function is good enough and need some help on how to create it.
My basic structure is that the hexapod has a camera with a CNN that creates bounding boxes around people. I want the hexapod to maximize the area of the bounding box (meaning walk closer to the person) and minimize the distance between the center of the bounding box and the center of the frame (to make sure its walking towards the human). My current reward function is:
return (-1) * (distance**2 * area**2)
I don’t like how I’m taking out the direction of the distance (meaning it has no idea if the bounding box is left or right from the center).
I have no experience creating reward functions (this is my first) and would love some guidance on how to make it ‘correct’.