How to Vectorize/Parallelize Reinforcement Learning Environments?

I feel like this is such an obvious problem, but I can’t find any clear answers. I have a Python class that conforms to OpenAI’s environment API, but it’s written in a way that it receives one input action per step and returns one reward per step. How do I parallelize this environment? I haven’t been able to find any clear answer online. A few people suggested baselines or stable_baselines, but these don’t appear to work with PyTorch and they’re currently broken by the switch to TensorFlow 2.0.

There are some other RL libraries (e.g.,, but they don’t appear to have professional support, so I’m concerned that if I use one, it’ll quickly become unusable.

I’m not an RL expert, but Catalyst is in the PyTorch ecosystem, so it might get some future support.
Have you had a look at this library already and would it fit your needs?

I decided to simply vectorize my environments and that appears to have given me the speed boost I need. I used the code provided here: