I am attempting to create an AI to play a game that I have created, that is similar to tic-tac-toe. I have chosen to have the AI make a choice about which move to make based on a set of scores that it gives, which is one for each play that can be made. I chose to not explicitly disallow it from trying to make a move when one has already been made but am trying to train it to learn not to play on tiles that have already been played.
I have done some tests with differing losses, and it seems to work when I tell it to minimize error directly, as a score of 1 for available tiles and -1 for unavailable ones. When I attempt to have the loss based on the actual state of the board, and the moves that are made, the AI seems to behave randomly and not learn. I am unsure as to why the model does not learn. Are there any obvious problems with my code/logic, and how can I fix them?
Code is here below: