Hi,
I’m trying to implement the PPO algorithm on a simple custom Mujoco environment where a Tiago robot should push a cube in a circular area. I want to implement the algorithm by relying on input frames given by env.render(). However, since I did not find any implementation on GitHub in continuous action spaces using frames I wanted to ask if the same strategy applied in the original DDPG paper works also for PPO.
In particular, the paper states:
“…In all tasks, we ran experiments using both a low-dimensional state description (such as joint angles and positions) and high-dimensional renderings of the environment. As in DQN (Mnih et al., 2013; 2015), in order to make the problems approximately fully observable in the high dimensional environment we used action repeats. For each timestep of the agent, we step the simulation 3 timesteps, repeating the agent’s action and rendering each time. Thus the observation reported to the agent contains 9 feature maps (the RGB of each of the 3 renderings) which allows the agent to infer velocities using the differences between frames. The frames were downsampled to 64x64 pixels and the 8-bit RGB values were converted to floating point scaled to [0, 1]. See supplementary information for details of our network structure and hyperparameters…”
I also followed the same strategy for PPO because in Atari Games, the same methodology of stacking frames is used no matter the algorithm you use (e.g. A2C, A3C and DQN) and I thought that this assumption is true also for continuous action spaces.
Do you think it could work or is this only for DDPG?