If I have two objectives (E.G. Maximize state x, have minimal control action u) and I define a reward for both, something like:
Reward x = delta x
Reward u = -u^2
I can train the network on the reward
delta x - u^2. Would it also be possible to define the two rewards separately and then use a (weighted) sum as my intuition tells me this gives the network more “insight” in the dynamics/reward structure and it could come to a policy which maximizes x while keeping a low u.
However I didn’t find any papers discussing multidimensional rewards.
Anyone have an idea if my proposed reward structure would be beneficial?