Reinforce deprecated?

I’ve being using action.reinforce(reward) for policy gradient based training, but it seems like there’s been a change recently and I get an error stating:

File “/opt/conda/envs/pytorch-py35/lib/python3.5/site-packages/torch/autograd/variable.py”, line 209, in reinforce
if not isinstance(self.grad_fn, StochasticFunction):
NameError: name ‘StochasticFunction’ is not defined

I read on github that .reinforce is being deprecated, and it’s suggested to use torch.distributions.

Is there a reason for this change? Reinforce seemed relatively simple and intuitive. It’ll be great if the reinforce example from pytorch is updated to reflect this change.

Here’s a good thread on the reason for the change. I think it can be summarized in two points: support for multiple stochastic outputs is difficult, and improving performance with Variables.

If you are on the 0.2 release, reinforce is still available. If you’re on master and have torch.distributions instead, the RL examples should now be as follows: https://github.com/pytorch/examples/pull/249

torch.distributions is much more general and suitable for a larger range of tasks - building the equivalent of reinforce using this is relatively simple (and arguably cleaner as it can be used to create a normal loss function to backpropagate).

3 Likes

This helps, thanks a lot!