I’m confused about the way that I calculate my loss function

I’m confused about the way that I calculate my loss

here is the function:


def test_epoch(iterator, model, criterion):

    train_loss = 0

    all_y = []

    all_y_hat = []

    model.eval()

    for batch in iterator:

        y = torch.stack([batch.toxic,

                         batch.severe_toxic,

                         batch.obscene,

                         batch.threat,

                         batch.insult,

                         batch.identity_hate],dim=1).float().to(device)

        text, length = batch.comment_text

        length = length.to('cpu')

        with torch.no_grad():

            y_hat = model(text, length)

        loss = criterion(y_hat, y)

        train_loss += loss.item()

        all_y.append(y)

        all_y_hat.append(y_hat)

    y = torch.vstack(all_y)

    y_hat = torch.vstack(all_y_hat)

    roc = roc_auc_score(y.cpu(),y_hat.round().detach().cpu())

    return train_loss / len(y) , roc

the way I calculated my loss in the above function is here:


train_loss = 0

...

loss = criterion(y_hat, y)

...

train_loss += loss.item()

...

return train_loss / len(y) , roc

and it gives at the first epoch

Loss: 0.0148(valid) | roc: 0.547727 (valid)

but when I calculate the loss in this way:


all_loss = []

...

loss = criterion(y_hat, y)

...

all_loss.append(loss.item())

...

return np.mean(all_loss), roc

it gives at the first epoch

` Loss: 0.7691(valid) | roc: 0.548824 (valid)

`

why the loss in the first way is totally different from the loss in the second way . and which one should I use or rely on ?

THANKS !

Could you provide an executable code snippet using random data to reproduce the different results, please?

Most likely because in first case:

  1. You collect loss.item() which by default already averaged to a batch size
  2. and later you average it by number of samples in the full epoch len(y)

and in the second case you:

  1. append all loss.items() to the list
  2. average it by the number of items in the list np.mean(all_loss)

Second way, I suppose gives a correct result. To correct the first case, you need to average by the number of batches train_loss / len(iterator). This way results should be equal, I guess.

To summirize: len(y) from the first case is not equal len(all_loss) in the second case.

1 Like

In case you want some (sorry it is lengthy :slight_smile: ) code example:

import numpy as np
import torch
import torchvision
import torchvision.transforms as transforms
import torch.nn as nn
import torch.nn.functional as F

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
transform = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
testset = torchvision.datasets.CIFAR10(root='./data', train=False,
                                       download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=64,
                                         shuffle=False, num_workers=2)
net = torchvision.models.resnet18()
net.fc = nn.Linear(512, 10)  # for CIFAR10
net.to(device)
criterion = nn.CrossEntropyLoss()

Case 1:

def test_epoch_case1(iterator, model, criterion):

    train_loss = 0
    all_y = []

    model.eval()

    for batch in iterator:
        X, y = batch[0], batch[1]
        X, y = X.to(device), y.to(device)

        with torch.no_grad():
            y_hat = model(X)

        loss = criterion(y_hat, y)
        train_loss += loss.item()

        all_y.append(y)

    y = torch.cat(all_y)
    print(f'This function returns collected train_loss: {train_loss} averaged by number of samples in y: {len(y)}')

    return train_loss / len(y)

test_epoch_case1(testloader, net, criterion)

prints:

This function returns collected train_loss: 362.682909488678 averaged by number of samples in y: 10000
0.0362682909488678

Case 2:

def test_epoch_case2(iterator, model, criterion):

    all_loss = []
    model.eval()

    for batch in iterator:
        X, y = batch[0], batch[1]
        X, y = X.to(device), y.to(device)

        with torch.no_grad():
            y_hat = model(X)

        loss = criterion(y_hat, y)
        all_loss.append(loss.item())

    print(f'This function returns collected train_loss: {np.sum(all_loss)} averaged by number of batches in dataloader: {len(iterator)} = {np.sum(all_loss) / len(iterator)}')

    return np.mean(all_loss)

test_epoch_case2(testloader, net, criterion)

prints:

This function returns collected train_loss: 362.682909488678 averaged by number of batches in dataloader: 157 = 2.310082226042535
2.310082226042535

Case 1 (corrected):

def test_epoch_case1_correct(iterator, model, criterion):

    train_loss = 0
    all_y = []

    model.eval()

    for batch in iterator:
        X, y = batch[0], batch[1]
        X, y = X.to(device), y.to(device)

        with torch.no_grad():
            y_hat = model(X)

        loss = criterion(y_hat, y)
        train_loss += loss.item()

        all_y.append(y)

    y = torch.cat(all_y)
    print(f'This function returns collected train_loss: {train_loss} averaged by number of batches in dataloader: {len(iterator)}')

    return train_loss / len(iterator)

test_epoch_case1_correct(testloader, net, criterion)

prints:

This function returns collected train_loss: 362.682909488678 averaged by number of batches in dataloader: 157
2.310082226042535

Hope it helps :slight_smile:

2 Likes