I’m working on an implementation of Neural Combinatorial Optimization with RL, and I got a bit stuck on the reinforce update for the pointer network.
Essentially, I’m using torch.multinomial
to select from the input at each step of decoding to construct the output, which is a permutation of the inputs (FYI I call the elements from the input that the pointer network selects the “actions”). Then, I run the “actions” through a reward function, and call action.reinforce(r)
for each “action”. However, what I think is happening is that since the actions are not directly the result of the call to torch.multinomial
(the indices I used to select the actions from the input are), I’m getting the error:
RuntimeError: reinforce() can be only called on outputs of stochastic functions
Is there any way to get around this so I can still use pytorch’s reinforce
method? Thanks in advance!