Gymnasium FrozenLake - why one-hot encoding for state is required?

impostor · March 27, 2025, 6:31pm

Hello, complete beginner here. I’m trying to implement Deep Q-Learning on the FrozenLake environment.

The state space is Discrete(16), and the action space is Discrete(4), so I initially designed a simple model:

nn.Sequential(
    nn.Linear(1, 128),
    nn.Sigmoid(),
    nn.Linear(128, 4)
)

However, it failed. I spent nearly three days on this without success. Core of my DQL algorithm is 100% correct (I even tried copying code from working solutions).

After reviewing other implementations, I noticed that everyone used one-hot encoding for state. So instead of directly feeding 1-16 to model they feed its one-hot encoding representation. When I applied it, my model suddenly started working and successfully solved the environment.

But I don’t understand why one-hot encoding is necessary. I found this comment:

https://www.reddit.com/r/reinforcementlearning/comments/1bwmwt3/comment/ky9a2hb/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1

I kinda understand it but, I still don’t get why the model fails to “memorize”/“bruteforce” the correct action for each state without one-hot encoding. I also tried adding more hidden layers to help the network learn a better representation of the discrete states, but that didn’t work either.

Can someone explain why one-hot encoding is essential in this case?
Is learning “gridworld” problem without one-hot encoding even possible?