Unexpected predictions of linear regression with embedding layer

Hello everyone!

I have a following issue, which I apparently can’t solve myself.

I’m doing sentiment analysis for two classes with a simple linear regression model and additionally use pretrained Glove embeddings. Here is my model:

class LogisticRegressionModel(nn.Module):
    def __init__(self, input_size: int, word_input_dim: int, 
                       word_output_dim: int, word_embedding_matrix: np.ndarray, 
                       output_classes: int):
        super(LogisticRegressionModel, self).__init__()

        self.word_embedding = nn.Embedding(word_input_dim, word_output_dim, padding_idx=0)
        self.word_embedding.weight = nn.Parameter(torch.tensor(word_embedding_matrix, 
                                                  dtype=torch.float32))
        self.word_embedding.weight.requires_grad = False
        self.linear = nn.Linear(input_size * word_output_dim, output_classes)

    def forward(self, x):
        word_embeddings = self.word_embedding(x)
        word_embeddings = word_embeddings.view(x.shape[0], -1)
        outputs = self.linear(word_embeddings)
        return outputs

I’m training batch-wise (batch_size=32) this model using SGD as optimiser and CrossEntropy with probabilities as criterion. As labels I use the probabilities of a sample being assigned to a class, e.g.:

tensor([[0.0000, 1.0000], [0.6000, 0.4000], [0.3333, 0.6667], ...])

Here is my training snippet:

self.model.train()
for curr_epoch in range(num_epochs):
   for features, labels in train_loader:
       model.zero_grad()
       predictions = model(features)
       loss = criterion(predictions, labels)
       loss.backward()
       optimizer.step()

And I get the following predictions:

batch_1:

tensor([[-0.0055,  0.0678],
        [-0.1317,  0.3271],
        [ 0.2585,  0.0894],
...
        [ 0.0702, -0.1932],
        [ 0.0395,  0.2260],
        [-0.0769,  0.0813]], grad_fn=<AddmmBackward>)

Such prediction looks ok for me, I would expect something like that.

batch_2:

tensor([[-32.1420,  32.2285],
        [-28.5901,  28.9668],
        [-15.8256,  15.9720],
...
        [-31.3301,  31.8487],
        [-30.2118,  30.2269],
        [-23.9350,  24.1548]], grad_fn=<AddmmBackward>)

Suddenly the predictions turn out to be so huge/tiny. The batches are rotated: every other predicts the values like that. But I cannot figure out why it happens - what do I do wrong here? Thank you in advance for your answers!

Sorry for a possible bad explanation - that’s the first topic I create on the forum, and also sorry if the question is too naive - I’m pretty much in the beginning of my data science journey.

In the first glance, your code looks fine to me.

  1. Can you plot your loss values after each epoch for getting a clearer picture? The logit prediction values themselves don’t reveal much.
  2. To make sure that your code is correct, experiment with a small sample and overfit your model to it (ie bring down the training loss to almost zero). To keep it simple, you can first try with BCELoss (instead of CrossEntropy with Probabilities).
  3. Try out AdamOptimizer and a small learning rate.
  4. If the train loss value is not decreasing, your model may be too simple. Try adding an LSTM (or similar sequence layer).