Below is a basic model that I want to use to learn whether a text belongs to one of two classes. Somehow it doesn’t actually learn anything and gets stuck around the mean (prevalence) of the classes.
Preprocessing
I used Torchtext to preprocess texts into padded sequences with a fixed length of 500 and created data loaders with batch_size 64. In my case I have two classes that are mutually exclusive (i.e., each text belongs to one class).
Model
class RNN(nn.Module):
def __init__(self, emb_dim, hidden_size, num_layers, num_classes):
super(RNN, self).__init__()
self.hidden_size = hidden_size
self.num_layers = num_layers
self.embedding = nn.Embedding(len(TEXT.vocab), emb_dim)
self.lstm = nn.LSTM(emb_dim, hidden_size, num_layers, batch_first=True)
self.fc = nn.Linear(hidden_size, num_classes)
def forward(self, x):
x = self.embedding(x)
out, _ = self.lstm(x)
out = self.fc(out[-1, :, :])
return out
Training
def train_model(model, criterion, optimizer, num_epochs=25):
# Loop over the range of epochs
i = 0
for epoch in range(num_epochs):
# Init stats for current epoch
running_corrects = 0
running_total = 0
running_loss = 0.0
print('=> Epoch {}'.format(epoch + 1))
# Train
model.train()
for inputs, labels in train_dl:
# Move to the GPU if possible
inputs = inputs.to(device)
labels = labels.to(device)
# Calculate the loss
outputs = model(inputs)
i += 1
loss = criterion(outputs, torch.argmax(labels, dim=1))
running_loss += loss.item()
running_corrects += (torch.argmax(outputs, dim=1) == torch.argmax(labels, dim=1)).sum().item()
running_total += outputs.size(0) # batch-size
optimizer.zero_grad()
loss.backward()
optimizer.step()
print('Train loss: {:.4f}, acc: {}/{} - {:.4f}%'.format(
loss.item(),
running_corrects,
running_total,
running_corrects/running_total))
return model
# Hyper-parameters
hidden_size = 128
emb_size = 64
num_layers = 2
batch_size = 64
num_epochs = 20
learning_rate = 1e-3
num_classes = 2
# Model
model = RNN(emb_size, hidden_size, num_layers, num_classes).apply(weights_init_uniform_rule).to(device)
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
trained_model = train_model(model, criterion, optimizer, num_epochs)
Output
=> Epoch 1
Train loss: 0.5666, acc: 9203/12406 - 0.7418%
=> Epoch 2
Train loss: 0.5607, acc: 9238/12406 - 0.7446%
=> Epoch 3
Train loss: 0.5779, acc: 9278/12406 - 0.7479%
=> Epoch 4
Train loss: 0.5647, acc: 9293/12406 - 0.7491%
=> Epoch 5
Train loss: 0.5620, acc: 9301/12406 - 0.7497%
=> Epoch 6
Train loss: 0.5798, acc: 9317/12406 - 0.7510%
=> Epoch 7
Train loss: 0.6341, acc: 9313/12406 - 0.7507%
=> Epoch 8
Train loss: 0.5561, acc: 9315/12406 - 0.7508%
=> Epoch 9
Train loss: 0.5261, acc: 9316/12406 - 0.7509%
=> Epoch 10
Train loss: 0.4997, acc: 9317/12406 - 0.7510%
=> Epoch 11
Train loss: 0.4767, acc: 9322/12406 - 0.7514%
=> Epoch 12
Train loss: 0.4280, acc: 9321/12406 - 0.7513%
=> Epoch 13
Train loss: 0.5455, acc: 9325/12406 - 0.7517%
=> Epoch 14
Train loss: 0.4680, acc: 9324/12406 - 0.7516%
=> Epoch 15
Train loss: 0.5636, acc: 9323/12406 - 0.7515%
=> Epoch 16
Train loss: 0.4159, acc: 9324/12406 - 0.7516%
=> Epoch 17
Train loss: 0.5905, acc: 9325/12406 - 0.7517%
=> Epoch 18
Train loss: 0.4072, acc: 9325/12406 - 0.7517%
=> Epoch 19
Train loss: 0.6096, acc: 9323/12406 - 0.7515%
=> Epoch 20
Train loss: 0.6035, acc: 9324/12406 - 0.7516%
My problem is that the loss is bouncing up and down and the accuracy is stuck around the mean of the classes. I tried modifying the learning rate (0.1 to 0.0001) but it doesn’t make a difference. It seems to me that the model is not actually learning anything. Any suggestions what I might be doing wrong are highly appreciated!