Difference between Train loss and Accuracy in CNN

shangguan91 · August 25, 2020, 10:19pm

why isnt Train loss=1- accuracy?
how is train loss calculated?

ayalaa2 · August 25, 2020, 10:27pm

Assuming we’re talking about a classification task, a model doesn’t just take in an image and say “it belongs to class A”. Instead, it output some number of values (called logits) which essentially is how confident the model is in some image being class A, B, C, and so on.

For example, say that for some image (that belongs to class A), the model outputs the logits 0.5, 0.4, and 0.1. What we want the model to instead output is something like 1.0, 0.0, 0.0. As in, be very confident that the image belongs to class A and be confident that it doesn’t belong to class B or C. If we only take our loss to be 1 - accuracy, this information isn’t conveyed to the model. We want to instead take our loss to be something like cross entropy loss. This is the typical way loss is calculated in classification networks. I would suggest reading more into that loss to get more of an intuition on what’s happening.

shangguan91 · August 25, 2020, 10:35pm

Thank you for elaborating. So, accuracy is only be calcualted as correctness/len(data). However, loss is something like probablity. could you pls explain more, how loss is accumulated in first epoch, suppose it has 20 batches. and how could i print(test loss) according to the code below.

`

    def train(epoch):
    model.train()
    for data in train_loader:
    data = data.to(device)
    optimizer.zero_grad()
    loss = F.nll_loss(model(data), data.y)
    loss.backward()
    optimizer.step()

def test(loader):
model.eval()

correct = 0
for data in loader:
    data = data.to(device)

    with torch.no_grad():
        pred = model(data).max(1)[1]
    correct += pred.eq(data.y).sum().item()
return correct / len(loader.dataset)`

ayalaa2 · August 25, 2020, 10:45pm

Sure, so I posed it as the model outputting probabilities, but in practice the model actually just outputs a value for each class (e.g, [20, -15, 1]), where we do something like soft-max to convert the logits into probabilities.

In your example, we’re using the negative log-likelihood which is actually what cross entropy loss ends up using. Keep in mind that NLL needs you to first use log softmax on the output of the model (if this isn’t already happening inside the model). See the example here for more details.

So essentially, what NLL will do for each datapoint is consider if that datapoint belongs to some class. If it does, we want the model to be as close to 1.0 as possible. Otherwise, want it as close to 0.0 as possible. This is all done in a differential way so we can compute our gradients where the model understands how to push the value closer to 1.0 or 0.0 depending on the label.