Why my train and valid Loss keep growing as the epochs go on?

With each epochs my train loss increases and I don’t know where the error in the training code is, does anyone have any ideas?

for e in range(epochs):
# keep track of training and validation loss
train_loss = 0.0
valid_loss = 0.0
running_loss = 0.0
running_corrects = 0.0

cont = 0
for inputs, label in (dataloaders['train']):
    # IF GPU is availible
    if train_on_gpu:
        inputs, label = inputs.cuda(), label.cuda()
    #inputs, labels = i ,data

    with torch.set_grad_enabled(True):
        logps = model(inputs)
        _, preds = torch.max(logps, 1) # tecnica nova de validacao
        loss = criterion(logps, label)
    running_loss += loss.item()
    print("running_loss = %f , interaction = %i " % (running_loss,cont) )
    running_corrects += torch.sum(preds == label.data)
    cont += 1
for inputs, label in (dataloaders['valid']):
    # IF GPU is availible
    if train_on_gpu:
        inputs, label = inputs.cuda(), label.cuda()
    with torch.no_grad():
        logps = model(inputs)
        _, preds = torch.max(logps, 1) # tecnica nova de validacao
        loss = criterion(logps, label)
        # update average validation loss 
    valid_loss += loss.item()

# calculate average losses

epoch_loss_train = running_loss / dataset_sizes['train']
epoch_acc_train = running_corrects.double() / dataset_sizes['train']

epoch_loss_valid = valid_loss / dataset_sizes['valid']

print('{} Loss: {:.4f} \tAcc: {:.4f}'.format('train', epoch_loss_train, epoch_acc_train))
print('{} \tLoss: {:.4f} '.format('valid', epoch_loss_valid))


Both losses are decreasing, which is generally fine.
Which criterion are you using and what kind of use case are you currently working on, as the range looks a bit different.

1 Like

I fix the problem. I put model.fc = classifier in my models.densenet121 and I thinks this is the source of the erro. But I do not know why , what is the difference between model.fc and model.classifier ?

model.fc and model.classifier are just the internal names for some submodules.
If you print the model, you’ll find the name of the last layer, which you could replace with a custom one for your use case:

model = models.densenet121()

In this example, the densenet121 uses the attribute name classifier for the last nn.Linear layer, so you should use this attribute name.
If you just assign a custom linear layer to model.fc it won’t be used and trained without changing the forward method.

1 Like