Loss not changing

akurniawan · September 2, 2017, 2:15am

Hi guys, I am trying to develop text classification with RNN. The model runs fine, however the loss after a couple of steps starts stagnating.
Below is the code of my model

class ClassifierModel(nn.Module):
    def __init__(self, hidden_size, num_layers, vocab_size, embedding_dim,
                label_size):
        super(ClassifierModel, self).__init__()

        self.embedding = nn.Embedding(vocab_size, embedding_dim)

        self.lstm = nn.LSTM(
            input_size=embedding_dim,
            hidden_size=hidden_size,
            num_layers=num_layers,
            dropout=0.5,
            bidirectional=False)
        self.dense = nn.Linear(
            in_features=hidden_size, out_features=label_size)

    def forward(self, entity_ids, seq_len):
        embedding = self.embedding(entity_ids)
        input_size = embedding.size()
        out, _ = self.lstm(
            embedding.view(input_size[1], input_size[0], input_size[2]))
        last_output = torch.index_select(out, 0, seq_len)
        logits = self.dense(last_output)
        logits = F.relu(logits[0, :, :])
        return logits

And this is the code I use for training

classifier_model = ClassifierModel(128, 2, len(vocabs), 200, 2)
loss_fn = nn.CrossEntropyLoss()
optimizer = optim.Adam(classifier_model.parameters())

for data_bucket, category_bucket, seq_len_bucket in train_data:
    for keys in data_bucket.keys():
        entity_ids = Variable(data_bucket[keys], requires_grad=False)
        category_ids = Variable(category_bucket[keys], requires_grad=False)
        seq_len = Variable(seq_len_bucket[keys], requires_grad=False)

        logits = classifier_model(entity_ids, seq_len)
        loss = loss_fn(logits, category_ids)
        print("Loss:", loss)
        pred, pred_idx = torch.max(logits, 1)
        correct_predictions = (pred_idx.data == category_bucket[keys]).sum()
        acc = correct_predictions / category_bucket[keys].size()[0]
        print("Accuracy:", acc)
        loss.backward()
        optimizer.step()

This is the output

Accuracy: 0.5263157894736842
Loss: Variable containing:
0.7016
[torch.FloatTensor of size 1]

Accuracy: 0.41379310344827586
Loss: Variable containing:
0.6972
[torch.FloatTensor of size 1]

Accuracy: 0.4166666666666667
Loss: Variable containing:
0.6948
[torch.FloatTensor of size 1]

Accuracy: 0.4488888888888889
Loss: Variable containing:
0.6918
[torch.FloatTensor of size 1]

Accuracy: 0.4492753623188406
Loss: Variable containing:
0.6931
[torch.FloatTensor of size 1]

Accuracy: 0.5121951219512195
Loss: Variable containing:
0.6931
[torch.FloatTensor of size 1]

Accuracy: 0.51010101010101
Loss: Variable containing:
0.6931
[torch.FloatTensor of size 1]

Accuracy: 0.525
Loss: Variable containing:
0.6931
[torch.FloatTensor of size 1]

As you can see from the output above, after several steps the loss stuck at 0.6931
Am I doing something wrong with my code? Thanks

dhpollack · September 3, 2017, 10:47am

Check the output of your network. It is probably outputting all zeros after a while. Unfortunately, that’s more a diagnosis of the problem. I’m not sure how to solve it. Perhaps try a lower learning rate.

Jadiel_de_Armas · February 28, 2018, 1:43am

Did you ever figure out what was wrong? I am having the exact same problem. I am getting the same value 0.6931 in my loss too.

mnazaal · July 2, 2018, 11:34am

Hi,
Im using a convolutional encoder-decoder network for image segmentation, and my loss also happens to get stuck at 0.6931 using BCELoss() as a loss function. Has any one found a solution to this given there are many cases of the same problem?

edit : one observation is 0.6931 ~ log_e(2) , and I changed my learning rate from 0.1 to 0.00001 and it works now

akurniawan · July 2, 2018, 12:15pm

@Jadiel_de_Armas @mnazaal my suggestion is to check every single layer of your networks to see whether the weights are changed correctly after the gradient loss is propagated (maybe some part of the networks isn’t connected to the next one, something wrong with the way you calculate the loss, learning rate too small, check whether the dimension of the input multiplied by the output match the one that you desired, etc)