I have a model in which the Loss is maximizing the Entropy(not cross-entropy) of the output. ie. I’m trying to minimize the negative Entropy.
H = - sum(p(x).log(p(x)))
S = nn.Softmax(dim = 1)
LS = nn.LogSoftmax(dim = 1)
b = S(res) * LS(res)
b = torch.mean(b,1)
b = torch.sum(b)
m = model()
#m is [BatchSize*3] output.
g = HLoss(m)
Would this calculate the gradients for m -> model() ?
Is there some way to check if the gradients are calculated?
I would create a new
def forward(self, x):
b = F.softmax(x, dim=1) * F.log_softmax(x, dim=1)
b = -1.0 * b.sum()
criterion = HLoss()
x = Variable(torch.randn(10, 10))
w = Variable(torch.randn(10, 3), requires_grad=True)
output = torch.matmul(x, w)
loss = criterion(output)
I don’t really know why you calculate the
b, so just add it to the code, if you need it.
@ptrblck Oh, sorry the mean was a mistake.
Is there a specific reason why you suggest to use a class instead of a function?
The function also provides the (w.grad). Just wondering?
Also, if my goal is to maximize the Entropy then which should be preferred:
b = b.sum() #Not multiplying it by -1.
And then minimizing that.
Or is it the same thing, implementation wise.
Also, if I was to get the output from a Model, is there any way to check if the gradients for the model is being calculated or not?
I think it’s just a matter of taste and apparently I like the
Module class, since it looks “clean” to me. All parameters are defined in the
__init__ while the
forward method just applies the desired behavior. Using a function would work as well of course, since my
Module is stateless.
If you would like to maximize the entropy, you could just remove the multiplication with
Assuming your model has a layer called
linear1, you can check the gradients with: