I have a model in which the Loss is maximizing the Entropy(not cross-entropy) of the output. ie. I’m trying to minimize the negative Entropy.
H = - sum(p(x).log(p(x)))
Let’s say:
def HLoss(res):
S = nn.Softmax(dim = 1)
LS = nn.LogSoftmax(dim = 1)
b = S(res) * LS(res)
b = torch.mean(b,1)
b = torch.sum(b)
return b
m = model()
#m is [BatchSize*3] output.
g = HLoss(m)
g.backward()
Would this calculate the gradients for m -> model() ?
Is there some way to check if the gradients are calculated?
3 Likes
I would create a new Module
:
class HLoss(nn.Module):
def __init__(self):
super(HLoss, self).__init__()
def forward(self, x):
b = F.softmax(x, dim=1) * F.log_softmax(x, dim=1)
b = -1.0 * b.sum()
return b
criterion = HLoss()
x = Variable(torch.randn(10, 10))
w = Variable(torch.randn(10, 3), requires_grad=True)
output = torch.matmul(x, w)
loss = criterion(output)
loss.backward()
print(w.grad)
I don’t really know why you calculate the mean
of b
, so just add it to the code, if you need it. 
17 Likes
@ptrblck Oh, sorry the mean was a mistake.
Is there a specific reason why you suggest to use a class instead of a function?
The function also provides the (w.grad). Just wondering?
Also, if my goal is to maximize the Entropy then which should be preferred:
- Changing
b = b.sum() #Not multiplying it by -1.
And then minimizing that.
- Minimizing
-H
.
Or is it the same thing, implementation wise.
Also, if I was to get the output from a Model, is there any way to check if the gradients for the model is being calculated or not?
I think it’s just a matter of taste and apparently I like the Module
class, since it looks “clean” to me. All parameters are defined in the __init__
while the forward
method just applies the desired behavior. Using a function would work as well of course, since my Module
is stateless.
If you would like to maximize the entropy, you could just remove the multiplication with -1
.
Assuming your model has a layer called linear1
, you can check the gradients with: model.linear1.weight.grad
.
3 Likes