If I understand correctly if we use a sampling method (i.e. torch.multinomial) and then use the .reinforce() method we will backpropagate the reward to whatever process that created the samples.
My question is whether we can create a weighted sampler and use the reinforce to update this sampler as well