I’d like to train a LSTM network in order to remember the steps that compose a trajectory to reach an object. So far, I’ve been using RL, so I’m thinking in terms of states and action . For instance, is it possible that the agent would get an initial state, produce an action and recognize the sequence and proceed with it ? Or would it be better to train it to produce only states ? But then, how could I control it ?
Well, because I want to see whether it is possible to be able to generalize, hence avoiding to compute pseudo-inverse and all the other calculations necessary for planning.