End-to-end A3C implementation with openai gym

Hey all :slight_smile: ,
I’m currently working on an implementation of A3C integrated with openai gym.
As some of the environments (e.g. CartPole-v0) don’t return an rgb array as an observation, I leverage env.render(mode='rgb_array') to obtain the pixel representation of the current game state.
However, this causes my processes to fail silently. It simply doesn’t execute and the program exits the training loop.

I was wondering whether anyone already encountered this problem and already found a fix for this. My current machine is a late 2013 Macbook Pro.
Tbh, I’m not that familiar with multiprocessing, but from what I have found on stack overflow, this might be related to the fact that you can’t call UI actions on a child thread.

They show how to do it in pytorch tutorials here: http://pytorch.org/tutorials/intermediate/reinforcement_q_learning.html

Under input extraction section

1 Like

Thanks for your input @dgriff!
However, this tutorial addresses the DQN algorithm in opposition to A3C, I am trying to implement. The issue I ran into is that I can’t call render(mode='rgb_array') in different processes when using torch.multiprocessing in order to spawn multiple worker agents and asynchronously update the global model as described by Mnih et al. (https://arxiv.org/pdf/1602.01783.pdf).

ah ok didn’t see you u doing on a3c. will try on my a3c real quick and see if I have same issue

yeah couldn’t get to work either. I think its as you say you call the UI actions in the child threads. A3C kinda overkill for Cartpole anyway try another environment :grin:

you can try my a3c repo out here if your interested: https://github.com/dgriff777/rl_a3c_pytorch
Has the top scores in openai gym Atari games


Too bad! Thanks for the effort though :slight_smile:
The overkill part is true, but I’d like to take the cart pole and transform it into a cart pole swingup env with some additional parameters for my master’s thesis as swingup is already a part of my thesis’ name :smiley:
I guess I will dig into the ways that e.g. PongDeterministic-v3 works seamlessly with multiple threads (I guess it’s due to the fact that it has a c++ backend which is called on the main thread) and try to apply it to the python only env. Maybe I can find a way to directly write into an image buffer, without having the need to perform the actual UI updates or offload the rendering onto the main thread. Too bad I’m not that familiar with multiprocessing in python nor graphics :sweat_smile:

Pong works cause gym env setup so you don’t have to call env.render to get RGB values and instead states are already set up to receive the raw the values. Where cartpole we need to grab image and extract raw values from that. “At least that’s how it looks like to me in gym env setup at quick glance lol”

That’s on way but you can also call render(mode='rgb_array') in your run() method. You can check out this gist to reproduce :robot:

Yeah I was saying it’s the fact that you need to use render() which in turn uses rendering code that gets the rgb values from a created image that I think causes the problem. Where rgb values in Atari come from converting just raw pixel data