I’m using an ensemble of several agents in parallel in a RL setting. Currently I’m doing it in the most naive way possible, having a list of agents predicting the same input (a single state) with all of them.
This is fairly slow, to an extend that you can visibly see the environment slowing down.
What is the most elegant way of doing that? Is anybody having experience with a similar problem? It is not about the training, this can be done efficiently, just the trajectory generation is enormously slow.