and I have miss understanding the loss function
how dose this line works
" return -torch.sum(outputs[range(num_examples), labels])/num_examples"
where outputs is result of the model with shape = batch_size x num_classes , labels with shape = batch_size

Let’s first have a look at `outputs[range(num_examples), labels]`.
As you said, `outputs` contains the model outputs and has the shape `[batch_size, num_classes]`.
`labels` seems to be a `torch.LongTensor`, containing the ground truth class for each sample in the batch.
In this code we are basically indexing `outputs`, such that the result will contain the model output corresponding to the label.
Here is a small example:

``````batch_size = 5
nb_classes = 3
outputs = torch.randn(batch_size, nb_classes)
labels = torch.randint(0, nb_classes, (batch_size,))
print(outputs)
> tensor([[ 0.6579, -2.1024, -0.4000],
[-0.3348, -0.4195, -1.5200],
[-0.3317,  0.6184,  1.7048],
[-0.1368, -1.1512, -0.6306],
[-0.3990, -1.2909, -0.8157]])
print(labels)
> tensor([0, 0, 1, 2, 1])
print(outputs[torch.arange(batch_size), labels])
> tensor([ 0.6579, -0.3348,  0.6184, -0.6306, -1.2909])

# This would yield the same result
outputs.gather(1, labels[:, None])
``````

As you can see, `labels` is used as an index to get the class logit of your output.

In the next step, these values are just summed and divided by the batch size, thus averaged, and multiplied by `-1`.

1 Like