In the official Q-Learning example, what does the env.unwrapped do exactly?

I know this might be a gym question rather than a pytorch one, but the open ai forum is just somehow not available at the moment. Anyway, I hope that someone could kindly help me out with this. Thank you so much in advance.


You can go through the code here:

As far as I know, there is a core super class called gym.Env and there are other sub classes of this to implement different environments (CartPoleEnv, MountainCarEnv etc). This unwrapped property is used to get the underlying gym.Env object from other environments.


The unwrapped just removes all the wrappers the environment instance has. In OpenAI Gym, you can specify wrapper around the environments in a hierarchical manner. For example, you could use a Monitor wrapper like this:

    mdir = tempfile.mkdtemp()
    env = gym.make(env_name)
    env = wrappers.Monitor(env, mdir, force=True, mode=monitor_mode)

This wrapper allows you to monitor training and it attaches videos at the end of training that you could access like this:

for video_path, meta_path in env.videos:
    print(video_path, meta_path)

But you could add other “layers” or “wrappers” to do different things. For instance, to limit the number of max time steps per episode, to stack a number of most recent observations/states/images (in the case of ATARI) of make palatalized instances of the environment (for A3C agents, for instance).

The unwrapped call just removes all these layers and returns the raw/core environment.

I think in the official example all the developer is trying to do is to remove the 200 time step limit the cart pole example defaults to. Check out these links for more information: