I made a custom environment in unity with simple grid-based mazes like this. The agent can only see (through vector observations, not image) around itself. It shoots rays (red in the picture) in 8 directions and gets info about what it hit (wall, exit or nothing) and the distance to the hit point. And the discovered rooms count is also fed to the observations. As for rewards, it gets -1 every step, and positive rewards upon discovering new rooms and finding exit. The goal is to explore the maze and find the exit
The problem is that the agent keeps sticking to walls, circling around and acting randomly essentially. I’ve tried doubling penalty for staying in one room, positive reward for high velocity and other things. What else can I try? I have full control over the environment and I’m not binded to this exact design of the agent. I’m using SAC with automatic entropy