What kind of dataset do i Need to train a nn to play a game?

Hille · August 16, 2018, 5:42am

I want to train a NN to play the game snake. I know that there are alot of implemetations out there but i want to do it on my own to understand every basic step.

I already got the snake game. Now i want to generate data from that game but I don’t know what data do I have to generate that I can train a NN with it to predict the next step of the snake.

I have no clue what data do I need. Is the position of the snake head and apple position enough? How do I have to define the label? What to I minimise? I could minimise the distance between apple and snake head but then the network does not learn to predict a direction. And it would not learn not to crash in walls and in itself.

Another approach would be to give the snake eyes. That means the data would contain informations about the direct pixels around the snake head for example if it’s a wall or parts of her body. In addition i could think of the position of the apple. With that input data I want the NN to predict a next direction. Therefore I could think about rewards. For a predictied direction in that the snake does not die e.g. 10 points and if the NN finds an apple 100 points. That means I would train the NN to maximise this reward but in this case where do I get my direction from?

If some one could give me an advise that would be great. I found a few implementation on the internet for that problem but they are all not good explained or documented.

Just for you. I solved image classification and segmentation tasks with CNNs. That means I already know a lot about how NN works etc.

Ty

justusschock · August 16, 2018, 10:45am

This is usually done with reinforcement learning. For this you would need an environment which is able to react on your networks predictions. The environment must provide the current state (e.g. an image of the current game situation) which can be fed to the CNN and must accept actions (the NNs predictions) and calculate a reward for the given action. The optimization goal would be to maximize the sequence of states without being game over (and maybe a higher reward for an apple).
You could have a look at the environments of openai-gym. Even if they don’t have such an environment you could look at other games or the cartpole environment as a beginning.