Loss functions for batches

Alex_Ge · June 29, 2018, 5:37pm

Hi,

Apologies if this seems like a noob question; I’ve read similar issues and their responses and looked at all the related examples.

I’m really confused about what the expected predicted and ideal arguments are for the loss functions. I’m building a CNN for image classification and there are 4 possible classes.
Trying to use nn.CrossEntropyLoss I get errors:

RuntimeError: multi-target not supported at ClassNLLCriterion.cu:16

The batch size is 10, and the labels are 4. So the outputs tensor is 10 x 4, and so is the labels tensor.

The code is (minified):

criterion = nn.CrossEntropyLoss().cuda()
print("using CE loss")

optimizer = torch.optim.SGD(model.parameters(),
                            lr=learning_rate,
                            momentum=0.9,
                            dampening=0,
                            weight_decay=l2_reg,
                            nesterov=False)

model.train()
total_step = len(train_loader)

for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):
        inputs = images.cuda().half()
        labels = labels.cuda().long()
        optimizer.zero_grad()
        outputs = model(inputs)
        print(labels.size())
        print(outputs.size())
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

I understand that the criterion expects to see a 1D tensor, and so I’ve tried reducing it:

print(labels.size())
ideals = labels.view(-1) 
print(ideals.size())
loss = criterion(outputs, ideals)

So now the sizes are:

labels: (10, 4)
ideals: (40,)

And now I get the error of:

ValueError: Expected input batch_size (10) to match target batch_size (40)

Which makes sense. So what I don’t get is:

do I need to iterate the ideal and actual output for each input in the batch?
or do I need to reduce both tensors to the same size and dimensions?

Most of the examples seems to be using directly

loss = criterion(outputs, labels)

And the only exception I’ve seen is at the official documentation (https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html#training-on-gpu) where the labels are:

_, predicted = torch.max(outputs, 1)

But sadly I don’t see an explanation?

Many thanks!
Alex

ptrblck · June 29, 2018, 8:47pm

The loss functions for classification, e.g. nn.CrossEntropyLoss or nn.NLLLoss, require your target to store the class indices instead of a one-hot encoded tensor.
So if your target looks like:

labels = torch.tensor([[0, 1, 0],
                       [1, 0, 0],
                       [0, 0, 1]])

you would have to get the corresponding indices by:

labels = labels.argmax(1)
print(labels)
> tensor([1, 0, 2])

Now you can use this target tensor for your criterion.

criterion = nn.CrossEntropyLoss()
x = torch.randn(3, 3, requires_grad=True)

loss = criterion(x, labels)
loss.backward()

Your code example calculates just the most probably class and stores its index in predicted.
This is done to calculate the accuracy or just get the most likely class for the current sample.

Alex_Ge · July 2, 2018, 11:32am

@ptrblck

Thank you very much, this worked and makes perfect sense.
I’m just wondering, did I miss the documentation/example, is this somewhere described in detail? Also, I am guessing that nn.CrossEntropyLoss uses Softmax internally?

Again, many Thanks!

ptrblck · July 2, 2018, 11:37am

The usage, shapes and types are described in the docs.
It uses nn.LogSoftmax and nn.NLLLoss internally, so you should pass the logits into the criterion.

JBoRu · March 10, 2020, 2:44am

Excuse me, I have one experiment named “opinion and aspect co-extraction” which judges whether the word in a sentence is opinion and aspect or not, and labels them as follows:

0：the background word
1：the begin of an aspect word
2: the include of an aspect word
3：the begin of an opinion word
4：the include of an opinion word

Therefore, there are five classes for each word in a sentence.And the example of input_x(after word to index) and output(after logsoftmax) and y(gold label) of one batch of sentences like that:

input_x: batch_size x sentence_length
output: batch_size x sentence_length x class_possibility
y: batch_size x sentence_length

After I input the input_x and train one epoch and get the output, i use the nn.NLLLoss to compute the loss between output and y,but it occured some error. The next I will use some code simulate the situation.

# the output of one batch of 10 sentences, every sentence has 78 words, and every word has a score after logsoftmax
p = torch.rand(10,78,5)
# the gold label of one batch of 10 sentences, every sentence has 78 words, and each word have one index of 0~4 which indicate its property, the detail meaning as above 
y = torch.ones(10,78).long()
# use the NLLLoss function
loss = nn.NLLLoss()
# get the loss value
r = loss(p,y)

and it report such error:

ValueError: Expected target size (10, 5), got torch.Size([10, 78])

so, i want to how can i cmpute the loss between the output and gold label of a batch?

ptrblck · March 10, 2020, 4:27am

The class logits should be in dim1 in your use case, so you could permute your output as:

p = p.permute(0, 2, 1)

or alternatively make sure your model outputs the right shapes, but this of course depends on the architecture.

JBoRu · March 10, 2020, 4:53am

I’m so sorry that is such a simple question!
But, I have read the pytorch docs, i can’t find the useful information to sovle my question, I wonder if the red circle in the picture below can solve my problem?

However, i’m sorry that i can’t understand the meanings of “K”, what means “in the case of K-dimensional loss”?Thanks for you patient reply!

ptrblck · March 10, 2020, 4:59am

K is a placeholder for the number of additional dimensions your output and target have.
In a simple classification use case, K would be 0, which means:

output = [batch_size, nb_classes], target = [batch_size]

In the case of K=1, e.g. for a temporal signal, where each sample belongs to a specific class:

output = [batch_size, nb_classes, seq_len], target = [batch_size, seq_len]

For a segmentation use case:

output = [batch_size, nb_classes, height width], target = [batch_size, height width]

…
As you can see, K simply indicates the dimensionality of your current use case and how the output and target should look like.

JBoRu · March 10, 2020, 5:09am

I’m so thanks for your simple and useful reply!
Best wish for you！