Let’s say that there is human teacher that wants to manually modify the policy of the agent (policy shaping) to speed up the learning of the agent. Do I have to use off-policy methods or I can get away with on-policy? Why?
Let’s say that there is human teacher that wants to manually modify the policy of the agent (policy shaping) to speed up the learning of the agent. Do I have to use off-policy methods or I can get away with on-policy? Why?