What ML model is optimal for the situation my situation (explained below):
The ML model should decide if to drill a hole in the ground in an effort to get an oil reserve. It should be rewarded if it correctly found oil and punished if it destroys a pipe. At any given moment it can either say to either drill or not drill.
Note:
Rewarded if: decided to drill and found oil, or decided not to drill and had a pipe below it
Punished id: decided to drill and had pipe under it or, decided not to drill had oil below it (missed opportunity)
I tried to create a Deep Q Learning model but this isn’t the relevant model since the current action doesn’t NOT affect the next state. Please comment on which model would work best for the situation!