Loss increasing instead of decreasing

gcamilo · May 22, 2018, 6:03am

For some reason, my loss is increasing instead of decreasing.

These are my train/test functions:

def train(model, device, train_input, optimizer, criterion, epoch):
	model.train()
	len_train = len(train_input)
	batch_size = args['batch_size']
	for idx in range(0, len(train_input), batch_size):
		optimizer.zero_grad()
		batch = train_input[idx: idx + batch_size]
		X, y = get_batch(batch, device)
		output = model(X)
		loss = criterion(output, y)
		loss.backward()
		optimizer.step()
		pred = output.max(1, keepdim=True)[1]
		correct = pred.eq(y.view_as(pred)).sum().item()
		if idx % 400 == 0:
			print("\nTrain epoch {} / {} [{} - {} / {}]\nLoss = {:.4}\nOutput = {}\n".format(epoch, 
				args['n_epochs'], idx, idx + batch_size, len_train, loss / batch_size, output))
			print('Correct {}'.format(correct))


def test(model, device, test_input, criterion):
	model.eval()
	test_loss = 0
	correct = 0
	batch_size = args['batch_size']
	with torch.no_grad():
		for idx in range(0, len(test_input), batch_size):
			batch = test_input[idx: idx + batch_size]
			X, y = get_batch(batch, device)
			output = model(X)
			test_loss += criterion(output, y)
			pred = output.max(1, keepdim=True)[1]
			correct += pred.eq(y.view_as(pred)).sum().item()
			if idx % 10 == 0:
				print('Pred {} Label {}'.format(pred, y))

	test_loss /= len(test_input)
	validation_data['loss'].append(test_loss)
	validation_data['acc'].append(correct / len(test_input))
	print("\nTest set: Average Loss: {:.4}, Accuracy: {}".format(test_loss,
	 correct / len(test_input)))

These are the criterion and optimizer:

optimizer = optim.Adam(model.parameters(), lr=args[‘initial_lr’], weight_decay=args[‘weight_decay’], amsgrad=True)
criterion = nn.CrossEntropyLoss().cuda()

Why is the loss increasing? The validation accuracy is increasing just a little bit.

Loss graph:
loss

Thank you.

ptrblck · May 22, 2018, 10:36am

The loss looks indeed a bit fishy.
You don’t have to divide the loss by the batch size, since your criterion does compute an average of the batch loss. If your batch size is constant, this can’t explain your loss issue.

Could you post some more information regarding your experiment?
How high is your learning rate? Could you post your model architecture?
What kind of data do you have?
It the loss increasing in each epoch or just the beginning of training?

gcamilo · May 22, 2018, 10:46am

My batch size is constant and equal to 10.
My learning rate starts at 1e-3 and I’m using decay:

	optim_scheduler = optim.lr_scheduler.StepLR(optimizer, 10, 0.989)
[...]
	optim_scheduler.step()

The architecture that I’m trying is pretty much Convolutional Layers followed by Max Pool layers (the last one is an Adaptive Max Pool), using ReLU and batch normalization. I have a GRU layer and a fully connected using a single hidden layer.
My inputs are variable sized arrays that were padded inside the batch. I’m padding as less as possible since I sort the dataset by the length of the array.
I’m training only for a small number of epochs since the error is weird, but I believe that it would keep increasing.

(Following something I found in the forum, I added the parameter amsgrad=True in my Adam optimizer, but I still have this loss problem)

gcamilo · May 22, 2018, 10:56am

After some small changes, I ran the model again and I also saved the training loss/acc:

ptrblck · May 22, 2018, 11:02am

This looks better now. It seems that your model is overfitting, since the training loss is decreasing, while the validation loss starts to increase.

Just out of curiosity, what were the small changes?

gcamilo · May 22, 2018, 11:04am

Learning rate, weight decay and optimizer (I tried both Adam and SGD).

gcamilo · May 22, 2018, 11:53am

I would just like to take the opportunity to ask something about the RNN input. In the docs, it says that that the tensor should be (Batch, Sequence, Features) when using batch_first=True, however my input is (Batch, Features, Sequence). Is x.permute(0, 2, 1) the correct way to fix the input shape? Or should I unbind and then stack it?

ptrblck · May 22, 2018, 1:29pm

It looks correct to me.
Maybe you would have to call .contiguous() on it, if it throws an error in your forward pass.

Dipika_Baad · April 24, 2019, 2:47am

Hi @gcamilo, which combination improved the charts? I am facing the same issue with validation loss increasing while the train loss is decreasing.

Al-amin_Ibrahim · January 3, 2023, 11:43pm

Hi sir, can you please add more details on the changes you made, because am struggling with the same issue. Please and please