Entropy of a Multidimensional Tensor

I’m looking to compute the discrete entropy (not cross-entropy) of a multidimensional tensor in my loss function. Consider the tensor to have shape BATCH_S x D, where each data point is a D dimensional vector, and I have BATCH_S of them. The formula requires me to compute the probability of seeing each data point, and I can’t seem to find a way to do this while retaining the ability to do a backward pass. To do so, all I need is the frequency of each element because I can divide that by the sum of the frequencies.

There’s torch.histc, but that doesn’t work for multidimensional data, and neither does torch.bincount. There’s also torch.unique which has the return_count optional param, but that’s has no grad_fn.

Any help is appreciated! Thanks

2 Likes

did you find a solution to this? I have the same question :frowning:

No they are inherently non-differentiable functions as they require binning (and uses an indicator function). Take a look at this paper which looks at a continuous approximation of mutual information, which is a function of entropy. You can approximate the indicator function (required for binning) with a triangular kernel function, but you will need to derive your own backward function.