Can I train a CNN model for RL?

Hi there!
I am working on a CNN model, which takes a chessboard as an input (an array containing the location of each piece type), and gives the value of the position (from 0 to 1); thus a regression problem. However, in order to do it using Reinforcement Learning, I would like the model to take as inputs the positions the chess engine is ‘seeing’ while it is playing training games against itself. Therefore, the model will ‘see’ single inputs (chessboard positions) at each time (the input size is 8x8x13), rather than a complete dataset. Furthermore, the network arquitecture will have to figure it out the value of each position on its own, without any help or target value. How can it be done?
Thanks,

figure it out the value of each position on its own, without any help or target value.

One way that’s been done before is that the “target” is winning the game – i.e. board positions from each game are labeled by whether or not they resulted in a win. There’s an explanation here, which used a database of 640,000 actual games to train their evaluation function: