I feel like this is such an obvious problem, but I can’t find any clear answers. I have a Python class that conforms to OpenAI’s environment API, but it’s written in a way that it receives one input action per step and returns one reward per step. How do I parallelize this environment? I haven’t been able to find any clear answer online. A few people suggested baselines or stable_baselines, but these don’t appear to work with PyTorch and they’re currently broken by the switch to TensorFlow 2.0.
There are some other RL libraries (e.g. https://github.com/zuoxingdong/lagom, https://github.com/astooke/rlpyt), but they don’t appear to have professional support, so I’m concerned that if I use one, it’ll quickly become unusable.