What ML model is optimal for this situation?

What ML model is optimal for the situation my situation (explained below):
The ML model should decide if to drill a hole in the ground in an effort to get an oil reserve. It should be rewarded if it correctly found oil and punished if it destroys a pipe. At any given moment it can either say to either drill or not drill.

Note:
Rewarded if: decided to drill and found oil, or decided not to drill and had a pipe below it
Punished id: decided to drill and had pipe under it or, decided not to drill had oil below it (missed opportunity)

I tried to create a Deep Q Learning model but this isn’t the relevant model since the current action doesn’t NOT affect the next state. Please comment on which model would work best for the situation!

hello @ML_Motivation
I don’t really get why the action doesn’t affect the next state.
IMO it does, it’s just that it’s a very short trajectory (just one step). In other words your value function is identical to your reward, and you should be able to use some sort of tabular q-learning.
Hope that helps

There are millions of possible states for this situation so an Artificial Neural Network (ANN) must be used; I think it would be inefficient to use a Q-table.
We check the location of the agent individually so that its choice does not influence the next state.
What model would be best for the situation? We tried DeepQ Learning but are not sure its the best model