Hi there, I’m facing some strange behavior in my training code: by increasing the learning from 0.01 to 0.02, the validation loss gets way too high (from 100 to even 1000), but after a couple of iterations (between 20 and 30), it descends to 2.0 and starts decreasing as expected.
Currently, I’m working with GoogLeNet (no pretraining), but I’d been dealing with this problem for a while with different architectures like ResNet or LeNet. It’s not a problem that persists over time, but I’m sure it indicates that something is not right.
Until now, I’d been choosing the highest learning rate possible before things messed up, but for this case with GoogleNet, I need the learning rate to be higher.
My training code is as follows:
def train_step(model, criterion, data_set, device, optimizer):
model.train()
for images, labels in data_set:
# Stating variables
images = images.to(device)
labels = labels.to(device)
# Clear gradients
optimizer.zero_grad()
# Forward step
output = model(images)
# Loss step
loss = criterion(output, labels)
# Backward step
loss.backward()
optimizer.step()
# Accuracy
top_p, prediction = output.topk(1, dim=1)
equals = (prediction == labels.view(*prediction.shape))
# Saving data
total_loss = loss.item()
total_acc = torch.mean(equals.type(torch.FloatTensor))
break
return model, total_loss, total_acc
def val_step(model, criterion, data_set, device):
with torch.no_grad():
model.eval()
for images, labels in data_set:
# Stating variables
images = images.to(device)
labels = labels.to(device)
# Forward steo
output = model(images)
# Loss step
loss = criterion(output, labels)
# Accuracy
top_p, prediction = output.topk(1, dim=1)
equals = (prediction == labels.view(*prediction.shape))
# Saving data
total_loss = loss.item()
total_acc = torch.mean(equals.type(torch.FloatTensor))
break
return total_loss, total_acc
(I break the loop to get just one batch sample, I know there are better ways but for this case I’ve just been lazy)
Hyperparameters:
num_classes = 7
batch_size = 128
criterion = CrossEntropyLoss()
optimizer = SGD(models[0].parameters(), lr=0.01, momentum=0.7, weight_decay=0.001)
I also have a lr_scheduler but that is just besides the point here. I would appreciate it if someone could tell a way to solve this problem or if, in any case, things just work like this, and I have to keep going with low learning rates.