Why does validation loss increase while training loss decreases?

my validation loss increases while training loss decreases, and accuracy increases for both validation and training

I’m trying to find a reason why could that be and how to fix it. Is it overfitting or can it be something else?

it seems like the overfitting problem,and you need to check the function of validation func.

this is my function, is there anything I can fix here?

def train_model(model, criterion, optimizer, num_epochs):

    best_acc = 0.0

    for epoch in range(num_epochs):
        print("Epoch {}/{}".format(epoch, num_epochs))
        print('-' * 10)

        loss_train = 0
        loss_val = 0
        acc_train = 0
        acc_val = 0
        min_valid_loss = np.inf

        # model.train(True)

        for batch, (inputs, labels) in enumerate(train_data):

            inputs, labels = inputs.cuda(), labels.cuda()

            # Clear the gradients
            optimizer.zero_grad()
            # Forward Pass
            outputs = model(inputs)
            # Find the Loss
            loss = criterion(outputs, labels)
            # Calculate gradients
            loss.backward()
            # Update Weights
            optimizer.step()
            # Calculate Loss
            loss_train += float(loss)

            _, preds = torch.max(outputs.data, 1)

            acc_train += int(torch.sum(preds == labels.data)) / len(preds)

        avg_loss = loss_train / len(train_data)
        avg_acc = acc_train / len(train_data)


        for batch, (inputs, labels) in enumerate(val_data):

            inputs, labels = inputs.cuda(), labels.cuda()

            # Forward Pass
            outputs = model(inputs)
            # Find the Loss
            loss = criterion(outputs, labels)
            # Calculate Loss
            loss_val += loss.item()

            _, preds = torch.max(outputs.data, 1)

            acc_val += int(torch.sum(preds == labels.data)) / len(preds)

            del inputs, labels, outputs, preds
            torch.cuda.empty_cache()

        avg_loss_val = loss_val / len(val_data)
        avg_acc_val = acc_val / len(val_data)

        print()
        print("Epoch {} result: ".format(epoch))
        print("Avg loss (train): {:.4f}".format(avg_loss))
        print("Avg acc (train): {:.4f}".format(avg_acc))
        print("Avg loss (val): {:.4f}".format(avg_loss_val))
        print("Avg acc (val): {:.4f}".format(avg_acc_val))
        print('-' * 10)
        print()

        if avg_acc_val > best_acc:
            best_acc = avg_acc_val
            best_model_wts = copy.deepcopy(model.state_dict())


    print("Best acc: ", best_acc)

    model.load_state_dict(best_model_wts)
    return model

it has some problems ,first , it seems you not change to eval model ,try to use model.eval(), and you not stop the backward ,try to use " with torch.no_grad()"

and you even not clear the loss in the last time in the eval model,maybe it is the issue of your problem

where should I change for modal.eval() mode? for training or for validation?

and I did cleaning like that before

    del inputs, labels, outputs, preds
    torch.cuda.empty_cache()

after training part, but it didn’t help

so I tried different ways with switching eval mode and added with torch.no_grad() but it still didn’t improve anything :crying_cat_face:

hello. sorry to disturb you. I meet the same issue. Could you please give me some advice how to solve it? Thanks, Best wishes

  1. Try adding dropout layers with p=0.25 to 0.5.

  2. Add augmentations to the data(this will be specific to the dataset you’re working with).

  3. Increase the size of your training dataset.

  4. Alternatively, you can try a high learning rate and batchsize(See super convergence). OneCycleLR — PyTorch 1.11.0 documentation