In NES, they represent population with a distribution over parameters
pψ(θ), this distribution being parametrized by ψ and they seek to maximize the objective value
𝔼θ ∼ pψ
The update rule is given by:
∇ψ𝔼θ ∼ pψF(θ) = 𝔼θ ∼ pψ[F(θ)∇ψlog pψ(θ)]
In Evolution strategies, what I understand from the text is that you have to remember the noise parameters used to generate each individual and then, given their reward, move the θ toward (or away if the reward is negative) the individual that scored the most. But I’m kinda lost in the NES case, I don’t really understand the update rule. How can I take the log probability of the population distribution ?
Could anyone shed some more lights please ?