I’m looking for something like torch.distributions.Multinomial().log_prob(), that only allows for 1 sample from each category.

I know you can do weighted random sampling without replacement using torch.multinomial, and I believe that is sampling from the categorical distribution. I’m trying to figure out how to compute the log_prob of such an unordered weighted sample? (Not sure if I compute it as multiple independent samples from a categorical distribution, or if it should be modeled as a conditional bernoulli distribution, or hypergeometric etc.)

This is perhaps more of a statistics theory question, but just wanted to check it’s not already implemented before I go and try to do it myself.

I’m neither a statistics expert, nor am I sure that I fully understand what you
are asking. Nonetheless I am cheerfully willing to provide misinformation to
this forum.

I think that for a single sample, you get the same distribution, and hence the
same log_prob(), from, say, Multinomial without replacement as you do
with replacement. With replacement, individual samples are independent.

Without replacement, you now get a non-trivial joint distribution – for example,
without replacement, P (4, 4) = 0 (because you can’t sample the value 4
a second time unless you replace it after the first sample).

You’d need to adjust the combinatorial term in multinomial pmf formula, filtering out x[i]>1 cases, but you don’t have to do that to train p parameters on observed x vectors - as pmf ratios would be the same as for multinomial, and omission of a normalization constant won’t affect gradient directions.

Looking at this again, my answer may be incorrect - it assumes that we observe censored draws made with static p; if instead p is inflated on subsequent draws as categories get removed, marginal probability ratios will differ… That’s indeed something similar to multivariate hypergeometric (urn model), but for continuous numbers…