I have a black box function `g(p,q)`

. It is evaluated using an external stimulator. For any given `p`

, we want to find `q`

that minimizes `g(p,q)`

. I want to model `f(p)`

that generates the *optimal* values of `q`

from `p`

. I want to model `f(p)`

as a neural network.

I am looking for directions on how to design the training process.

Both `p`

and `q`

are vectors with continuous values.

The closest training process I could think of was the cross-entropy method used in reinforcement learning. But, I think there could be a better way.