Reinforce weighted sampling

If I understand correctly if we use a sampling method (i.e. torch.multinomial) and then use the .reinforce() method we will backpropagate the reward to whatever process that created the samples.

My question is whether we can create a weighted sampler and use the reinforce to update this sampler as well

1 Like

yes you can do that (I think).
See how the stochastic nodes are implemented: