DQN tutorial with no cheat

In the DQN tutorial (https://github.com/pytorch/tutorials/blob/master/Reinforcement%20(Q-)Learning%20with%20PyTorch.ipynb), at a point, it is suggested to crop the image around the object of interest, while we expect the algorithm to be able to extract this feature. If I try to remove this “cheating” trick (commented in the following code)…

resize = T.Compose([T.ToPILImage(), T.Scale(40, interpolation=Image.CUBIC), T.ToTensor()])

# This is based on the code from gym.
screen_width = 600
def get_cart_location():
    world_width = env.x_threshold * 2
    scale = screen_width / world_width
    return int(env.state[0] * scale + screen_width / 2.0) # MIDDLE OF CART
def get_screen():
    screen = env.render(mode='rgb_array').transpose((2, 0, 1)) # transpose into torch order (CHW)
    # Strip off the top and bottom of the screen
    # this is the trick :
    screen = screen[:, 160:320]
    view_width = 320
    cart_location = get_cart_location()
    if cart_location < view_width // 2:
        slice_range = slice(view_width)
    elif cart_location > (screen_width - view_width // 2):
        slice_range = slice(-view_width,None)
        slice_range = slice(cart_location - view_width // 2, cart_location + view_width // 2)
    # Strip off the edges, so that we have a square image centered on a cart
    screen = screen[:, :, slice_range]
    # Convert to float, rescare, convert to torch tensor (this doesn't require a copy)
    screen = np.ascontiguousarray(screen, dtype=np.float32) / 255
    screen = torch.from_numpy(screen)
    # Resize, and add a batch dimension (BCHW)
    return resize(screen).unsqueeze(0)

plt.imshow(get_screen().squeeze(0).permute(1, 2, 0).numpy(), interpolation='none')

… I can’t obtain any learning. I tried several different combinations of parameters, and I also try to change the structure of the network, but no way to reach any acceptable result. I there a way to make it work ?

That’s the primary reason why the input is cropped :slight_smile:
That’s how it is with RL, it’s quite unstable.


Haha, I take your response as a challenge! I will find a way.

Hey, Did you find a way to make the agent learn without the trick?