In the official Q-Learning example, what does the env.unwrapped do exactly?

deepbluesome · November 3, 2018, 7:45pm

I know this might be a gym question rather than a pytorch one, but the open ai forum is just somehow not available at the moment. Anyway, I hope that someone could kindly help me out with this. Thank you so much in advance.

InnovArul · November 3, 2018, 11:19pm

You can go through the code here:

github.com

openai/gym/blob/master/gym/core.py

from gym import logger

import gym
from gym import error
from gym.utils import closer

env_closer = closer.Closer()

# Env-related abstractions

class Env(object):
    """The main OpenAI Gym class. It encapsulates an environment with
    arbitrary behind-the-scenes dynamics. An environment can be
    partially or fully observed.

    The main API methods that users of this class need to know are:

        step
        reset
        render

This file has been truncated. show original

As far as I know, there is a core super class called gym.Env and there are other sub classes of this to implement different environments (CartPoleEnv, MountainCarEnv etc). This unwrapped property is used to get the underlying gym.Env object from other environments.

mimoralea · November 6, 2018, 8:51pm

The unwrapped just removes all the wrappers the environment instance has. In OpenAI Gym, you can specify wrapper around the environments in a hierarchical manner. For example, you could use a Monitor wrapper like this:

    mdir = tempfile.mkdtemp()
    env = gym.make(env_name)
    env = wrappers.Monitor(env, mdir, force=True, mode=monitor_mode)
    env.seed(seed)

This wrapper allows you to monitor training and it attaches videos at the end of training that you could access like this:

for video_path, meta_path in env.videos:
    print(video_path, meta_path)

But you could add other “layers” or “wrappers” to do different things. For instance, to limit the number of max time steps per episode, to stack a number of most recent observations/states/images (in the case of ATARI) of make palatalized instances of the environment (for A3C agents, for instance).

The unwrapped call just removes all these layers and returns the raw/core environment.

I think in the official example all the developer is trying to do is to remove the 200 time step limit the cart pole example defaults to. Check out these links for more information:

https://github.com/openai/gym/blob/master/gym/envs/init.py#L53-L58

https://github.com/openai/gym/blob/master/gym/envs/registration.py#L49-L54