# How to compute the log_prob of a weighted random sample without replacement

I’m looking for something like `torch.distributions.Multinomial().log_prob()`, that only allows for 1 sample from each category.

I know you can do weighted random sampling without replacement using torch.multinomial, and I believe that is sampling from the categorical distribution. I’m trying to figure out how to compute the log_prob of such an unordered weighted sample? (Not sure if I compute it as multiple independent samples from a categorical distribution, or if it should be modeled as a conditional bernoulli distribution, or hypergeometric etc.)

This is perhaps more of a statistics theory question, but just wanted to check it’s not already implemented before I go and try to do it myself.

Hi Kristen!

I’m neither a statistics expert, nor am I sure that I fully understand what you
are asking. Nonetheless I am cheerfully willing to provide misinformation to
this forum.

I think that for a single sample, you get the same distribution, and hence the
same `log_prob()`, from, say, `Multinomial` without replacement as you do
with replacement. With replacement, individual samples are independent.

Without replacement, you now get a non-trivial joint distribution – for example,
without replacement, `P (4, 4) = 0` (because you can’t sample the value `4`
a second time unless you replace it after the first sample).

Best.

K. Frank

You’d need to adjust the combinatorial term in multinomial pmf formula, filtering out x[i]>1 cases, but you don’t have to do that to train p parameters on observed x vectors - as pmf ratios would be the same as for multinomial, and omission of a normalization constant won’t affect gradient directions.

Thanks @KFrank and @googlebot. That makes sense (even though my question did not )

Looking at this again, my answer may be incorrect - it assumes that we observe censored draws made with static p; if instead p is inflated on subsequent draws as categories get removed, marginal probability ratios will differ… That’s indeed something similar to multivariate hypergeometric (urn model), but for continuous numbers…

No worries. I was expecting what I described to be a named/known probability distribution, but it turns out it’s not.