Loss_function misunderstnad

Mohammed_Awney · February 25, 2019, 10:06pm

Hi, i ma follow cs230-code-examples
and I have miss understanding the loss function
how dose this line works
" return -torch.sum(outputs[range(num_examples), labels])/num_examples"
where outputs is result of the model with shape = batch_size x num_classes , labels with shape = batch_size
Thanks in advance

ptrblck · February 26, 2019, 12:17am

Let’s first have a look at outputs[range(num_examples), labels].
As you said, outputs contains the model outputs and has the shape [batch_size, num_classes].
labels seems to be a torch.LongTensor, containing the ground truth class for each sample in the batch.
In this code we are basically indexing outputs, such that the result will contain the model output corresponding to the label.
Here is a small example:

batch_size = 5
nb_classes = 3
outputs = torch.randn(batch_size, nb_classes)
labels = torch.randint(0, nb_classes, (batch_size,))
print(outputs)
> tensor([[ 0.6579, -2.1024, -0.4000],
          [-0.3348, -0.4195, -1.5200],
          [-0.3317,  0.6184,  1.7048],
          [-0.1368, -1.1512, -0.6306],
          [-0.3990, -1.2909, -0.8157]])
print(labels)
> tensor([0, 0, 1, 2, 1])
print(outputs[torch.arange(batch_size), labels])
> tensor([ 0.6579, -0.3348,  0.6184, -0.6306, -1.2909])

# This would yield the same result
outputs.gather(1, labels[:, None])

As you can see, labels is used as an index to get the class logit of your output.

In the next step, these values are just summed and divided by the batch size, thus averaged, and multiplied by -1.