Hello, complete beginner here. I’m trying to implement Deep Q-Learning on the FrozenLake environment.
The state space is Discrete(16)
, and the action space is Discrete(4)
, so I initially designed a simple model:
nn.Sequential(
nn.Linear(1, 128),
nn.Sigmoid(),
nn.Linear(128, 4)
)
However, it failed. I spent nearly three days on this without success. Core of my DQL algorithm is 100% correct (I even tried copying code from working solutions).
After reviewing other implementations, I noticed that everyone used one-hot encoding for state. So instead of directly feeding 1-16 to model they feed its one-hot encoding representation. When I applied it, my model suddenly started working and successfully solved the environment.
But I don’t understand why one-hot encoding is necessary. I found this comment:
I kinda understand it but, I still don’t get why the model fails to “memorize”/“bruteforce” the correct action for each state without one-hot encoding. I also tried adding more hidden layers to help the network learn a better representation of the discrete states, but that didn’t work either.
Can someone explain why one-hot encoding is essential in this case?
Is learning “gridworld” problem without one-hot encoding even possible?