Loss isn't changing with LSTM

praveenbenedict · February 16, 2023, 1:41pm

I wrote a simple LSTM classifier to predict the sentiment of movie reviews. Here is the link to the review.

Here is the link to the dataset: https://drive.google.com/file/d/1BpPnO-nuShi1ZaykhS1rBc7qJyC1sQlW/view?usp=share_link

While training, the loss isn’t changing at all. Can anyone help me with identifying the issue.

Thanks!

vdw · February 16, 2023, 3:21pm

I only had a quick look on my phone, might have some more time tomorrow.

Anyway, I would at least try with torch.optim.Adam. That basic SGD optimizer is not great.

matt.carlson · April 7, 2023, 3:15am

I would also try a learning rate scheduler. it’s possible that the model has found a shortcut to return the average of your training data and is stuck in a premature minimum. Does loss decrease at first then settle into a constant number quickly? Also I think 1e-1 is a very high learning rate if no scheduler or manual decay is used.

Here is an example of a learning rate scheduler i’ve had good results with.

## Calling scheduler.step() every batch
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, len(train_dataloader) * epochs_per_cycle, eta_min=0)

## Calling scheduler.step() every epoch
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, epochs_per_cycle, eta_min=0)

Another thing Is I have seen advice on this forum to use this order during training -

loss.backward()
optimizer.step()
optimizer.zero_grad()

As opposed to your implementation

optim.zero_grad()
loss.backward()
optim.step()

This is by no means expert advice, but maybe it can point you in the right direction.