How to setup simple Distributed Reinforcement Learning


Me and a friend are working on making an RL LSTM for a video game. However, the rounds can take a while and we were hoping to have one of our (mine) machines run 24/7 (using a VM to allow us to still use our computers as it needs to simulate mouse movement and clicking) and his machine to connect sporadically to also run its own version of the game but contribute to the model learning process.

Is the possible? If not, what is the closest alternative approach?

I have looked through the documentation and while I understand how the current PyTorch distribution packages would work with a given dataset and whatnot, I am not sure how that would work for our use case. Since the model would need to update the parameters after each timestep in the game (each timestep is long, 1 second being the fastest but can be longer). Furthermore, since his computer would connect and disconnect, it would need someway of getting the latest model parameters…

Any advice?

Does this mean you would need elasticity (allow worker processes to join/leave dynamically) in torch.distributed.rpc?

Yes, exactly. However, I found only a single mention of the term elasticity on I am looking in the wrong spot?

Hey @thelastspark, yep, that’s the right doc. Unfortunately, we don’t have elasticity support yet. @H-Huang and @Kiuk_Chung are actively working on that.

Besides allowing nodes to join/leave dynamically, do you have other requirements for RPC?

That was honestly the biggest one because the RL model I am working on will require a single VM per instance because it interacts with a game by simulating mouse movements, as such, I was hoping to have a main trainer instance on my PC running, and then spin up as many VM’s on my local machine as I could during the day, and then down to 1 overnight (just so that the fans aren’t making sleep impossible) and my friend would do the same thing.

Otherwise training would take literal ages as the rounds can take anywhere from 20-45 minutes each and we can speed it each game up so much before there is a hard limit due to the need for image recognition for certain events that the algorithm would miss if the speed scale is to high.