Is this problem avoidable?
If non-null rewards are very rare, it makes sense to “bootstrap” the same past action-reward more than once if the received reward wasn’t null.
Is it possible to do it? Or should we just avoid
Variable.reinforce in that case?
You could use something like this if you want to be able to reinforce vars cumulatively:
def reinforce(var, reward):
if var.creator.reward is torch.autograd.stochastic_function._NOT_PROVIDED:
var.creator.reward = reward
var.creator.reward += reward
Thanks, it works!
By curiosity, what is the reason to reinforce a stochastic function only once in the usual way?