How to compute the log_prob of a weighted random sample without replacement

kristen · February 2, 2021, 6:31am

I’m looking for something like torch.distributions.Multinomial().log_prob(), that only allows for 1 sample from each category.

I know you can do weighted random sampling without replacement using torch.multinomial, and I believe that is sampling from the categorical distribution. I’m trying to figure out how to compute the log_prob of such an unordered weighted sample? (Not sure if I compute it as multiple independent samples from a categorical distribution, or if it should be modeled as a conditional bernoulli distribution, or hypergeometric etc.)

This is perhaps more of a statistics theory question, but just wanted to check it’s not already implemented before I go and try to do it myself.

KFrank · February 2, 2021, 3:11pm

Hi Kristen!

I’m neither a statistics expert, nor am I sure that I fully understand what you
are asking. Nonetheless I am cheerfully willing to provide misinformation to
this forum.

I think that for a single sample, you get the same distribution, and hence the
same log_prob(), from, say, Multinomial without replacement as you do
with replacement. With replacement, individual samples are independent.

Without replacement, you now get a non-trivial joint distribution – for example,
without replacement, P (4, 4) = 0 (because you can’t sample the value 4
a second time unless you replace it after the first sample).

Best.

K. Frank

googlebot · February 2, 2021, 6:35pm

You’d need to adjust the combinatorial term in multinomial pmf formula, filtering out x[i]>1 cases, but you don’t have to do that to train p parameters on observed x vectors - as pmf ratios would be the same as for multinomial, and omission of a normalization constant won’t affect gradient directions.

kristen · February 3, 2021, 9:04am

Thanks @KFrank and @googlebot. That makes sense (even though my question did not )

googlebot · February 3, 2021, 3:44pm

Looking at this again, my answer may be incorrect - it assumes that we observe censored draws made with static p; if instead p is inflated on subsequent draws as categories get removed, marginal probability ratios will differ… That’s indeed something similar to multivariate hypergeometric (urn model), but for continuous numbers…

kristen · February 18, 2021, 6:52am

No worries. I was expecting what I described to be a named/known probability distribution, but it turns out it’s not.