Hello,

I’m interested in Evolution Strategies and I have a question regarding the openAI article https://arxiv.org/pdf/1703.03864.pdf (also see https://arxiv.org/pdf/1106.4487.pdf).

In NES, they represent population with a distribution over parameters

*p*_{ψ}(*θ*), this distribution being parametrized by *ψ* and they seek to maximize the objective value

𝔼_{θ ∼ pψ}

The update rule is given by:

∇_{ψ}𝔼_{θ ∼ pψ}*F*(*θ*) = 𝔼_{θ ∼ pψ}[*F*(*θ*)∇_{ψ}log *p*_{ψ}(*θ*)]

In Evolution strategies, what I understand from the text is that you have to remember the noise parameters used to generate each individual and then, given their reward, move the θ toward (or away if the reward is negative) the individual that scored the most. But I’m kinda lost in the NES case, I don’t really understand the update rule. How can I take the log probability of the population distribution ?

Could anyone shed some more lights please ?

Thanks !