What is the most efficient way to collect samples in RL like PPO?

One of the main problems in reinforcement learning is that it is too inefficient to collect samples, so the GPU usually needs to wait for the CPU to collect samples. The obvious way is to create multiple agents through multiple processes to collect samples in parallel in multiple environments. Therefore, it is necessary to design a multi-process system, each process has its own actor with same parameters, same agent and same environment.

The question is how to use pytorch to create a such system?

Great question @martin_xiao !
Have a look at torchrl: we provide a set of multi-processing utils to run environments in parallel (see our data collectors and parallel envs).
They are quite efficient and should suit your purpose. We also provide a PPO implementation by the way.
Don’t hesitate to reach out if you have any further question!

1 Like

Hi Vincent,

torchrl is a splendid work!! I will look into it!