Simple RNN stuck around the mean

proxyproxyproxy_prox · June 19, 2019, 3:14pm

Below is a basic model that I want to use to learn whether a text belongs to one of two classes. Somehow it doesn’t actually learn anything and gets stuck around the mean (prevalence) of the classes.

Preprocessing

I used Torchtext to preprocess texts into padded sequences with a fixed length of 500 and created data loaders with batch_size 64. In my case I have two classes that are mutually exclusive (i.e., each text belongs to one class).

Model

class RNN(nn.Module):
    def __init__(self, emb_dim, hidden_size, num_layers, num_classes):
        super(RNN, self).__init__()
        self.hidden_size = hidden_size
        self.num_layers = num_layers
        self.embedding = nn.Embedding(len(TEXT.vocab), emb_dim)        
        self.lstm = nn.LSTM(emb_dim, hidden_size, num_layers, batch_first=True)
        self.fc = nn.Linear(hidden_size, num_classes)

    def forward(self, x):
        x = self.embedding(x)
        out, _ = self.lstm(x)
        out = self.fc(out[-1, :, :])
        return out

Training

def train_model(model, criterion, optimizer, num_epochs=25):
    # Loop over the range of epochs
    i = 0
    for epoch in range(num_epochs):
        # Init stats for current epoch
        running_corrects = 0
        running_total = 0
        running_loss = 0.0
        
        print('=> Epoch {}'.format(epoch + 1))
        
        # Train
        model.train()
        for inputs, labels in train_dl:
            # Move to the GPU if possible
            inputs = inputs.to(device)
            labels = labels.to(device)
                     
            # Calculate the loss
            outputs = model(inputs)
            i += 1
            loss = criterion(outputs, torch.argmax(labels, dim=1))
            running_loss += loss.item()

            running_corrects += (torch.argmax(outputs, dim=1) == torch.argmax(labels, dim=1)).sum().item()
            running_total += outputs.size(0) # batch-size 
            
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()
        
        print('Train loss: {:.4f}, acc: {}/{} - {:.4f}%'.format(
            loss.item(), 
            running_corrects, 
            running_total, 
            running_corrects/running_total))
                             
    return model

# Hyper-parameters
hidden_size = 128
emb_size = 64
num_layers = 2
batch_size = 64
num_epochs = 20
learning_rate = 1e-3
num_classes = 2

# Model
model = RNN(emb_size, hidden_size, num_layers, num_classes).apply(weights_init_uniform_rule).to(device)
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
trained_model = train_model(model, criterion, optimizer, num_epochs)

Output

=> Epoch 1
Train loss: 0.5666, acc: 9203/12406 - 0.7418%
=> Epoch 2
Train loss: 0.5607, acc: 9238/12406 - 0.7446%
=> Epoch 3
Train loss: 0.5779, acc: 9278/12406 - 0.7479%
=> Epoch 4
Train loss: 0.5647, acc: 9293/12406 - 0.7491%
=> Epoch 5
Train loss: 0.5620, acc: 9301/12406 - 0.7497%
=> Epoch 6
Train loss: 0.5798, acc: 9317/12406 - 0.7510%
=> Epoch 7
Train loss: 0.6341, acc: 9313/12406 - 0.7507%
=> Epoch 8
Train loss: 0.5561, acc: 9315/12406 - 0.7508%
=> Epoch 9
Train loss: 0.5261, acc: 9316/12406 - 0.7509%
=> Epoch 10
Train loss: 0.4997, acc: 9317/12406 - 0.7510%
=> Epoch 11
Train loss: 0.4767, acc: 9322/12406 - 0.7514%
=> Epoch 12
Train loss: 0.4280, acc: 9321/12406 - 0.7513%
=> Epoch 13
Train loss: 0.5455, acc: 9325/12406 - 0.7517%
=> Epoch 14
Train loss: 0.4680, acc: 9324/12406 - 0.7516%
=> Epoch 15
Train loss: 0.5636, acc: 9323/12406 - 0.7515%
=> Epoch 16
Train loss: 0.4159, acc: 9324/12406 - 0.7516%
=> Epoch 17
Train loss: 0.5905, acc: 9325/12406 - 0.7517%
=> Epoch 18
Train loss: 0.4072, acc: 9325/12406 - 0.7517%
=> Epoch 19
Train loss: 0.6096, acc: 9323/12406 - 0.7515%
=> Epoch 20
Train loss: 0.6035, acc: 9324/12406 - 0.7516%

My problem is that the loss is bouncing up and down and the accuracy is stuck around the mean of the classes. I tried modifying the learning rate (0.1 to 0.0001) but it doesn’t make a difference. It seems to me that the model is not actually learning anything. Any suggestions what I might be doing wrong are highly appreciated!

alorentemur · June 19, 2019, 3:53pm

Are you making use of the hidden state?

Maybe you could use the lstm like it’s done in this tutorial:

https://pytorch.org/tutorials/beginner/nlp/sequence_models_tutorial.html

proxyproxyproxy_prox · June 19, 2019, 6:27pm

That was the answer, thank you so much! I ended up following the advice from this discussion.

The final code is now:

class RNN(nn.Module):
    def __init__(self, emb_dim, hidden_size, num_layers, num_classes):
        super(RNN, self).__init__()
        self.hidden_size = hidden_size
        self.num_layers = num_layers
        self.embedding = nn.Embedding(len(TEXT.vocab), emb_dim)        
        self.lstm = nn.LSTM(emb_dim, hidden_size, num_layers)
        self.fc = nn.Linear(hidden_size, num_classes)

    def forward(self, text):
        embeds = self.embedding(text)
        lstm_output, (last_hidden_state, last_cell_state) = self.lstm(embeds)
        linear_input = last_hidden_state[-1]
        out = self.fc(linear_input)
        
        return out