Hi all,
I’m having trouble understanding how and when to use a.reinforce(), despite the examples. In particular, why do I need it if I want to implement REINFORCE ?
Thanks for your help
Hi all,
I’m having trouble understanding how and when to use a.reinforce(), despite the examples. In particular, why do I need it if I want to implement REINFORCE ?
Thanks for your help
You need to call .reinforce()
on outputs of stochastic function (.bernoulli()
, .normal()
and .uniform()
at the moemnt), if you want to have autograd estimate the gradient of the expectation of the reward. No need to use it if you don’t do any sampling.
You could implement REINFORCE manually, it’s just a convenient way of doing that.
Thanks, that’s just what I needed !