Hi, i ma follow cs230-code-examples
and I have miss understanding the loss function
how dose this line works
" return -torch.sum(outputs[range(num_examples), labels])/num_examples"
where outputs is result of the model with shape = batch_size x num_classes , labels with shape = batch_size
Thanks in advance
Let’s first have a look at outputs[range(num_examples), labels]
.
As you said, outputs
contains the model outputs and has the shape [batch_size, num_classes]
.
labels
seems to be a torch.LongTensor
, containing the ground truth class for each sample in the batch.
In this code we are basically indexing outputs
, such that the result will contain the model output corresponding to the label.
Here is a small example:
batch_size = 5
nb_classes = 3
outputs = torch.randn(batch_size, nb_classes)
labels = torch.randint(0, nb_classes, (batch_size,))
print(outputs)
> tensor([[ 0.6579, -2.1024, -0.4000],
[-0.3348, -0.4195, -1.5200],
[-0.3317, 0.6184, 1.7048],
[-0.1368, -1.1512, -0.6306],
[-0.3990, -1.2909, -0.8157]])
print(labels)
> tensor([0, 0, 1, 2, 1])
print(outputs[torch.arange(batch_size), labels])
> tensor([ 0.6579, -0.3348, 0.6184, -0.6306, -1.2909])
# This would yield the same result
outputs.gather(1, labels[:, None])
As you can see, labels
is used as an index to get the class logit of your output.
In the next step, these values are just summed and divided by the batch size, thus averaged, and multiplied by -1
.
1 Like