Gaussian Mixture Model maximum likelihood training

Typically, GMMs are trained with expectation-maximization, because of the need for implementing the unitary constraint over the categorical variables.

However, in Pytorch, it is possible to get a differentiable log probability from a GMM. Why is this possible? How exactly is the constraint implemented in the code?

Do you refer to MixtureSameFamily? That one contains a nested “Categorial” distribution object, with non differentiable distribution parameters.

It is possible (though not trivial) to train Categorical with sampling - docs describe REINFORCE / score function. This issue is orthogonal to using gradient descent to train GMMs in general, or “unitary constraint” (that is handled by using Categorical distribution [class])

All I did was use the logprob function of the MixtureSameFamily and trained it with gradient descent over samples in my dataset, it works.

Ah, right, log_prob / local MLE simultaneous estimation works, it is just not too good with random nn initializations and SGD. May be ok for some tasks.

To directly answer your question:

    def log_prob(self, x):
        x = self._pad(x)
        log_prob_x = self.component_distribution.log_prob(x)  # [S, B, k]
        log_mix_prob = torch.log_softmax(self.mixture_distribution.logits,
                                         dim=-1)  # [B, k]
        return torch.logsumexp(log_prob_x + log_mix_prob, dim=-1)  # [S, B]

this is just transformed mixture density formula, where everything is differentiable (similarly to a weighted sum).