How to define a information entropy loss?

fangwei123456 · April 13, 2019, 2:03pm

I try to define a information entropy loss. The input is a tensor(1*n), whose elements are all between [0, 4]. The EntroyLoss will calculate its information entropy loss.
For exampe, if the input is

[0,1,0,2,4,1,2,3]

then

p(0) = 2 / 8 = 0.25
p(1) = 2 / 8 = 0.25
p(2) = 2 / 8 = 0.25
p(3) = 1 / 8 = 0.125
p(4) = 1 / 8 = 0.125

so information entropy loss is

Loss = -( p(0)*log2(p(0)) + p(1)*log2(p(1)) + p(2)*log2(p(2)) + p(3)*log2(p(3)) )

my code is here:

class EntroyLoss(nn.Module):
    def __init__(self):
        super(EntroyLoss, self).__init__()
    def forward(self, x): 
        y = x.view(-1)
        p = torch.zeros([5])
        for i in range(y.shape[0]):
            p[y[i].int()] = p[y[i].int()] + 1

        p = p.float() / y.shape[0] 

        entropy = -p.mul(p.log2()).sum()
        return entropy

But pytorch can not calcualate grad:

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

Tony-Y · April 13, 2019, 2:53pm

You can use totch.histc() to define an information entropy, but another error occurs:

RuntimeError: the derivative for ‘histc’ is not implemented

class EntroyLoss(nn.Module):
    def __init__(self):
        super(EntroyLoss, self).__init__()
    def forward(self, x):
        y = x.view(-1)
        p = torch.histc(y, bins=5, min=0, max=4)
        p = p / y.shape[0]

        entropy = -p.mul(p.log2()).sum()
        return entropy

fangwei123456 · April 14, 2019, 12:49am

Thanks! I find it in https://pytorch.org/docs/stable/torch.html#torch.histc. But I got this:
_th_histic is not implemented for type torch.cuda.FloatTensor

Tony-Y · April 14, 2019, 8:00am

Can you define a derivative of histogram?

Tony-Y · April 14, 2019, 9:18am

Math Problem
Given a vector X=(…, x_i ,…), H(X) denotes a histogram of X. Define a derivative of H(X) with respect to the i-th value x_i.

A Solution
X_i(+) and X_i(-) denote two vector made by replacing the i-th value of X with x_i + 1 and x_i - 1, respectively. We can define an interpolated histogram

I(X, h) = [ (1+h) H(X_i(+)) + (1-h) H(X_i(-)) ] / 2

for -1 < h < 1. Differentiating this interpolated histogram with respect to h, we get

dI(X,h)/dh =  [ H(X_i(+)) - H(X_i(-)) ] / 2.

@fangwei123456 If you need this derivative, you can define a backward() method as described in Defining new autograd functions.

fangwei123456 · April 14, 2019, 9:34am

Thanks! I will try and see it.