I'm working on an implementation of Neural Combinatorial Optimization with RL, and I got a bit stuck on the reinforce update for the pointer network.
Essentially, I'm using
torch.multinomial to select from the input at each step of decoding to construct the output, which is a permutation of the inputs (FYI I call the elements from the input that the pointer network selects the "actions"). Then, I run the "actions" through a reward function, and call
action.reinforce(r) for each "action". However, what I think is happening is that since the actions are not directly the result of the call to
torch.multinomial (the indices I used to select the actions from the input are), I'm getting the error:
RuntimeError: reinforce() can be only called on outputs of stochastic functions
Is there any way to get around this so I can still use pytorch's
reinforce method? Thanks in advance!