Hello everyone!
I have a following issue, which I apparently can’t solve myself.
I’m doing sentiment analysis for two classes with a simple linear regression model and additionally use pretrained Glove embeddings. Here is my model:
class LogisticRegressionModel(nn.Module):
def __init__(self, input_size: int, word_input_dim: int,
word_output_dim: int, word_embedding_matrix: np.ndarray,
output_classes: int):
super(LogisticRegressionModel, self).__init__()
self.word_embedding = nn.Embedding(word_input_dim, word_output_dim, padding_idx=0)
self.word_embedding.weight = nn.Parameter(torch.tensor(word_embedding_matrix,
dtype=torch.float32))
self.word_embedding.weight.requires_grad = False
self.linear = nn.Linear(input_size * word_output_dim, output_classes)
def forward(self, x):
word_embeddings = self.word_embedding(x)
word_embeddings = word_embeddings.view(x.shape[0], -1)
outputs = self.linear(word_embeddings)
return outputs
I’m training batch-wise (batch_size=32) this model using SGD as optimiser and CrossEntropy with probabilities as criterion. As labels I use the probabilities of a sample being assigned to a class, e.g.:
tensor([[0.0000, 1.0000], [0.6000, 0.4000], [0.3333, 0.6667], ...])
Here is my training snippet:
self.model.train()
for curr_epoch in range(num_epochs):
for features, labels in train_loader:
model.zero_grad()
predictions = model(features)
loss = criterion(predictions, labels)
loss.backward()
optimizer.step()
And I get the following predictions:
batch_1:
tensor([[-0.0055, 0.0678],
[-0.1317, 0.3271],
[ 0.2585, 0.0894],
...
[ 0.0702, -0.1932],
[ 0.0395, 0.2260],
[-0.0769, 0.0813]], grad_fn=<AddmmBackward>)
Such prediction looks ok for me, I would expect something like that.
batch_2:
tensor([[-32.1420, 32.2285],
[-28.5901, 28.9668],
[-15.8256, 15.9720],
...
[-31.3301, 31.8487],
[-30.2118, 30.2269],
[-23.9350, 24.1548]], grad_fn=<AddmmBackward>)
Suddenly the predictions turn out to be so huge/tiny. The batches are rotated: every other predicts the values like that. But I cannot figure out why it happens - what do I do wrong here? Thank you in advance for your answers!
Sorry for a possible bad explanation - that’s the first topic I create on the forum, and also sorry if the question is too naive - I’m pretty much in the beginning of my data science journey.