Details of torch.nn.CrossEntropyLoss

Soniya · February 15, 2018, 9:23am

Hi,
For a problem I have used CrossEntropyLoss as criteria to evaluate performance of a neural network. In order to know details of this function i visits this page http://pytorch.org/docs/master/_modules/torch/nn/modules/loss.html#CrossEntropyLoss. Here the CrossEntropyLoss is defined using the F.cross_entropy function where F is declared as from … import functional as F. I’m unable to find the source code of F.cross_entropy function. Does anybody know the details of this function.

jpeg729 · February 15, 2018, 9:43am

github.com

pytorch/pytorch/blob/677030b1cb12a2ff32fe85a3c2b9cc547ef47de8/torch/nn/functional.py#L1364


size_average: if ``True`` the output is divided by the number of elements
    in input tensor. Default: ``True``
reduce (bool, optional): By default, the losses are averaged
    over observations for each minibatch, or summed, depending on
    size_average. When reduce is False, returns a loss per input/target
    element instead and ignores size_average. Default: ``True``


""")




def cross_entropy(input, target, weight=None, size_average=True, ignore_index=-100, reduce=True):
r"""This criterion combines `log_softmax` and `nll_loss` in a single
function.


See :class:`~torch.nn.CrossEntropyLoss` for details.


Args:
    input: Variable :math:`(N, C)` where `C = number of classes`
    target: Variable :math:`(N)` where each value is
        `0 <= targets[i] <= C-1`
    weight (Tensor, optional): a manual rescaling weight given to each

Soniya · February 15, 2018, 10:00am

I saw this link. Particularly I’m interested in implementation of F.cross_entropy function.

jpeg729 · February 15, 2018, 10:20am

I assume you read the definition of the cross_entropy function in that file.

def cross_entropy(input, target, weight=None, size_average=True, ignore_index=-100, reduce=True):
    return nll_loss(log_softmax(input, 1), target, weight, size_average, ignore_index, reduce)

Which bits did you not understand?

Soniya · February 15, 2018, 10:38am

What is the physical significance of ignore_index. And one more thing that i want to know that the range of this CrossEntrpyLoss function. Will it be always in 0 and 1.

jpeg729 · February 15, 2018, 11:55am

From the docs
ignore_index (int, optional) – Specifies a target value that is ignored and does not contribute to the input gradient. When size_average is True, the loss is averaged over non-ignored targets.

Also from the docs the formula for CrossEntropyLoss is
loss(x, class) = -log(exp(x[class]) / (\sum_j exp(x[j])))

Now some basic math

exp(x[class]) is always positive
\sum_j exp(x[j]) is always greater than exp(x[class])
so exp(x[class]) / (\sum_j exp(x[j])) is always in the range [0, 1]
log(anything in the range [0, 1]) is in the range (-inf, 0]

Hence -log(exp(x[class]) / (\sum_j exp(x[j]))) is in the range [0, +inf)

Soniya · February 21, 2018, 3:24pm

Thanks for explaining me.