Train loss doesn't decrease

Odin_NI · March 25, 2021, 5:26pm

Hello,

I solve problem of classification. I have a lot of keywords in text and must detected their existence in text. I used torchtext classification with embeddingbag and embedding. Accuracy was equal to 90%. But, I hope that model prediction will be 95% +.
Now, I use embedding + LSTM and have global problem with loss. Loss is not change. I was changing lr from 1000 to 0.0001, but loss change from 0.7 to 0.69.
Help me please

class TextClassificationModel_vec(nn.Module):
    def __init__(self, vocab_size, embed_dim, hidden_size):
        super(TextClassificationModel_vec, self).__init__()
        self.embedding = nn.Embedding(vocab_size, embed_dim, padding_idx=0, sparse=True)
        self.lstm = nn.LSTM(input_size=embed_dim, hidden_size=hidden_size, num_layers=1, batch_first=True)
        self.fc = nn.Linear(hidden_size, 1)
        self.dropout = nn.Dropout(p=0.6)
        
    def forward(self, x):
        x = self.embedding(x)
        x = self.dropout(x)
        x, __ = self.lstm(x)
        x = self.fc(x[:, -1, :])
        return x

vocab = Vocab(c, min_freq=1)
vocab_size = len(vocab) # 11038 
emsize = 64
hidden_size = 128
model = TextClassificationModel_vec(vocab_size, emsize, hidden_size).to(device)

EPOCHS = 15000
BATCH_SIZE = 50

criterion = torch.nn.BCEWithLogitsLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=0.001)

Flock1 · March 25, 2021, 5:42pm

Have you tried any other optimizer? Like Adam or Adagrad?

Odin_NI · March 25, 2021, 5:45pm

Yes, I used Adam at first, but then i switched to sgd.

AlphaBetaGamma96 · March 25, 2021, 5:46pm

Have you tried printing out the gradient values for model.parameters() after calling loss.backward()? Maybe your initial gradient is near 0, so the model won’t update at all?

Odin_NI · March 25, 2021, 5:54pm

I have just checked. Yes indeed gradients first 10e-5 then drops to 10e-45. Can’t fix it?

AlphaBetaGamma96 · March 25, 2021, 5:56pm

Ok, so that’s the issue. It has nothing to do with the optimizer. Do you use any in-place operations? That would kill the gradients.

Odin_NI · March 25, 2021, 6:05pm

In train function:

total_loss += loss.item()
total_acc += (torch.round(torch.sigmoid(predicted)) == label).sum().item()

In collate.

def collate_batch(batch):
    label_list, text_list = [], []
    for _label, _text in batch:
        label_list.append(torch.FloatTensor([_label]))
        text = torch.tensor(_text, dtype=torch.int64)
        text_list.append(text)

    label_list = torch.stack(label_list)
    text_list = torch.stack(text_list)
    
    return label_list.to(device), text_list.to(device)

I am sorry, I am newbie to programming.

AlphaBetaGamma96 · March 25, 2021, 6:09pm

No need to apologize!

When you calculate your gradient what variable are you back-propagating? So, are you calculating your gradients by doing total_loss.backward() ?

Odin_NI · March 25, 2021, 6:14pm

I think it will be more convenient this way

def train(dataloader):
    model.train()
    
    total_acc, total_count = 0, 0
    total_loss = 0
    for idx, (label, text) in enumerate(dataloader):
        optimizer.zero_grad()
        predicted = model(text)
        loss = criterion(predicted, label)
        loss.backward()
        optimizer.step()
        total_loss += loss.item()
        total_acc += (torch.round(torch.sigmoid(predicted)) == label).sum().item()
        total_count += label.size(0)
    return total_acc, total_loss, total_count

AlphaBetaGamma96 · March 25, 2021, 6:40pm

I was reading through this tutorial on Text Classification and I can’t really see what’s wrong with your model.

One thing that could be a problem is with the cost function BCEWithLogitsLoss , after looking at the docs they pass a pos_weight variable. (which defaults to None). Perhaps this could be the issue? Maybe you’re multipling all your losses by None which would give a near 0 gradient?

Odin_NI · March 25, 2021, 6:53pm

Did not help

AlphaBetaGamma96 · March 25, 2021, 6:57pm

Ah, no!

Then I don’t know what else to recommend. Perhaps there’s something of use in the tutorial I shared above? Perhaps someone else with more experience might be able to help! Sorry!

Odin_NI · March 25, 2021, 7:00pm

This is my first experience with LSTM. Unfortunately, not yet successful.
Thank you for your help

Odin_NI · March 26, 2021, 1:07pm

I have change LSTM on GRU and it work. But, accuracy is lower than when I used only Embedding.
Embedding + Linear: acc = 0.95
Embedding + GRU + Linear: acc = 0.9

AlphaBetaGamma96 · March 26, 2021, 1:13pm

That’s pretty strange behaviour, perhaps mention this to a developer? They might have a better idea as to why it works with a GRU rather than an LSTM! But glad to hear it’s working!