alexis-jacq
(Alexis David Jacq)
1
Is this problem avoidable?
If non-null rewards are very rare, it makes sense to “bootstrap” the same past action-reward more than once if the received reward wasn’t null.
Is it possible to do it? Or should we just avoid Variable.reinforce
in that case?
You could use something like this if you want to be able to reinforce vars cumulatively:
def reinforce(var, reward):
if var.creator.reward is torch.autograd.stochastic_function._NOT_PROVIDED:
var.creator.reward = reward
else:
var.creator.reward += reward
2 Likes
alexis-jacq
(Alexis David Jacq)
3
Thanks, it works!
By curiosity, what is the reason to reinforce a stochastic function only once in the usual way?