# Can I backpropagate different distributions at once using Policy Gradient?

Hi, my issue is that I use a set of different batches to compute one (multinomial) distribution and one sampling per each batch, in a single epoch. How can I backpropagate those different distributions? I’m thinking on just creating a single Multinomial containing all the distribution vectors as a matrix and also get all the sampling vectors at once (as a matrix, again). Does it have sense?

So, something like \pi(a_i = k | s) ~ p_i^k (1-p_i)^k up to a normalization for i=1…N
And you want to learn it as a single distribution: \pi(A=[k1..kn] | s) right?