Learning rate causes validation loss to go high

Hi there, I’m facing some strange behavior in my training code: by increasing the learning from 0.01 to 0.02, the validation loss gets way too high (from 100 to even 1000), but after a couple of iterations (between 20 and 30), it descends to 2.0 and starts decreasing as expected.

Currently, I’m working with GoogLeNet (no pretraining), but I’d been dealing with this problem for a while with different architectures like ResNet or LeNet. It’s not a problem that persists over time, but I’m sure it indicates that something is not right.

Until now, I’d been choosing the highest learning rate possible before things messed up, but for this case with GoogleNet, I need the learning rate to be higher.

My training code is as follows:

def train_step(model, criterion, data_set, device, optimizer):
    model.train()
    for images, labels in data_set:
        # Stating variables
        images = images.to(device)
        labels = labels.to(device)

        # Clear gradients
        optimizer.zero_grad()
        
        # Forward step
        output = model(images)
        
        # Loss step
        loss = criterion(output, labels)
        
        # Backward step
        loss.backward()
        optimizer.step()
        
        # Accuracy
        top_p, prediction = output.topk(1, dim=1)
        equals = (prediction == labels.view(*prediction.shape))

        # Saving data
        total_loss = loss.item()
        total_acc = torch.mean(equals.type(torch.FloatTensor))

        break

    return model, total_loss, total_acc

def val_step(model, criterion, data_set, device):
    with torch.no_grad():
        model.eval()
        for images, labels in data_set:
            # Stating variables
            images = images.to(device)
            labels = labels.to(device)
            
            # Forward steo
            output = model(images)
            
            # Loss step
            loss = criterion(output, labels)
            
            # Accuracy
            top_p, prediction = output.topk(1, dim=1)
            equals = (prediction == labels.view(*prediction.shape))

            # Saving data
            total_loss = loss.item()
            total_acc = torch.mean(equals.type(torch.FloatTensor))

            break

    return total_loss, total_acc

(I break the loop to get just one batch sample, I know there are better ways but for this case I’ve just been lazy)

Hyperparameters:

num_classes = 7
batch_size = 128

criterion = CrossEntropyLoss()
optimizer = SGD(models[0].parameters(), lr=0.01, momentum=0.7, weight_decay=0.001)

I also have a lr_scheduler but that is just besides the point here. I would appreciate it if someone could tell a way to solve this problem or if, in any case, things just work like this, and I have to keep going with low learning rates.