Calculating the Entropy loss

ankitvad · March 7, 2018, 4:41am

I have a model in which the Loss is maximizing the Entropy(not cross-entropy) of the output. ie. I’m trying to minimize the negative Entropy.
H = - sum(p(x).log(p(x)))
Let’s say:

def HLoss(res):
	S = nn.Softmax(dim = 1)
	LS = nn.LogSoftmax(dim = 1)
	b = S(res) * LS(res)
	b = torch.mean(b,1)
	b = torch.sum(b)
    return b

m = model()
#m is [BatchSize*3] output.
g = HLoss(m)
g.backward()

Would this calculate the gradients for m -> model() ?
Is there some way to check if the gradients are calculated?

ptrblck · March 7, 2018, 6:11am

I would create a new Module:


class HLoss(nn.Module):
    def __init__(self):
        super(HLoss, self).__init__()

    def forward(self, x):
        b = F.softmax(x, dim=1) * F.log_softmax(x, dim=1)
        b = -1.0 * b.sum()
        return b
        
criterion = HLoss()
x = Variable(torch.randn(10, 10))
w = Variable(torch.randn(10, 3), requires_grad=True)
output = torch.matmul(x, w)
loss = criterion(output)
loss.backward()
print(w.grad)

I don’t really know why you calculate the mean of b, so just add it to the code, if you need it.

ankitvad · March 7, 2018, 6:29pm

@ptrblck Oh, sorry the mean was a mistake.
Is there a specific reason why you suggest to use a class instead of a function?
The function also provides the (w.grad). Just wondering?
Also, if my goal is to maximize the Entropy then which should be preferred:

Changing b = b.sum() #Not multiplying it by -1.
And then minimizing that.
Minimizing -H.

Or is it the same thing, implementation wise.
Also, if I was to get the output from a Model, is there any way to check if the gradients for the model is being calculated or not?

ptrblck · March 7, 2018, 8:28pm

I think it’s just a matter of taste and apparently I like the Module class, since it looks “clean” to me. All parameters are defined in the __init__ while the forward method just applies the desired behavior. Using a function would work as well of course, since my Module is stateless.

If you would like to maximize the entropy, you could just remove the multiplication with -1.

Assuming your model has a layer called linear1, you can check the gradients with: model.linear1.weight.grad.