LSTM multiclass text classification accuracy does not change

Colospring_Liu · November 26, 2020, 9:40pm

Hi guys, I am new to deep learning models and pytorch. I have been working on a multiclass text classification with three output categories. I used LSTM model for 30 epochs, and batch size is 32, but the accuracy for the training data is fluctuating and the accuracy for validation data does not change. Here are my codes.

class AdvancedModel(nn.Module):
    def __init__(self, vocab_size, embedding_dim, hidden_dim, output_dim, n_layers, 
                 bidirectional, dropout, pad_idx):
        
        super().__init__()
        
        self.embedding = nn.Embedding(vocab_size, embedding_dim, padding_idx = pad_idx)
        
        # Define some parameters
        self.output_dim = output_dim
        self.n_layers = n_layers
        self.hidden_dim = hidden_dim
        
        # Define layers
        self.lstm = nn.LSTM(embedding_dim, hidden_dim, n_layers, bidirectional=bidirectional, batch_first=True)
        self.dropout = nn.Dropout(dropout)
        self.fc = nn.Linear(hidden_dim, output_dim)
    
        
    def forward(self, text, text_lengths):
        
        #text = [sent len, batch size]
        
        embedded = self.dropout(self.embedding(text))
        
        #embedded = [sent len, batch size, emb dim]
        
        #pack sequence. Clambing the lengths to avoid errors in some bugs
        packed_embedded = nn.utils.rnn.pack_padded_sequence(embedded, text_lengths.clamp(min=1, max=50))
        
        # Initializing hidden state for first input using method defined below
        hidden = self.init_hidden(BATCH_SIZE)
        lstm_out, hidden = self.lstm(packed_embedded, hidden)
        
        unpacked, unpacked_len = torch.nn.utils.rnn.pad_packed_sequence(lstm_out, batch_first=True)
        lstm_out=unpacked[:, -1, :]
        
        out = self.dropout(lstm_out)
        out = self.fc(out)
        
        return out
    
    def init_hidden(self, batch_size):
        weight = next(self.parameters()).data
        hidden = (weight.new(self.n_layers, batch_size, self.hidden_dim).zero_().to(device),
                      weight.new(self.n_layers, batch_size, self.hidden_dim).zero_().to(device))
        return hidden

Here is the parameters I use:

INPUT_DIM = len(TEXT.vocab)
EMBEDDING_DIM = 100
HIDDEN_DIM = 300
OUTPUT_DIM = len(LABEL.vocab)
N_LAYERS = 1
BIDIRECTIONAL = False
DROPOUT = 0.5

The rest of the codes are based on this:
https://github.com/bentrevett/pytorch-sentiment-analysis/blob/master/5%20-%20Multi-class%20Sentiment%20Analysis.ipynb

Here is the training accuracy I get:

The validation accuracy never changes for the whole 30 epochs.
It would be really helpful if anyone knows what is wrong here. Thank you!

vdw · November 26, 2020, 10:42pm

I don’t see any obvious wrong with your code. Admittedly, I rarely use packing, but it looks alright. So here just some ideas/questions:

Can you overfit your model on a (very) small sample dataset, i.e., do you get a training accuracy from 100% with say just 10-100 sentences? If not, then something fundamental is not working with your training.
For initial debugging, I would comment out the Dropout layers, just to minimize and cause for issues – although there shouldn’t be. But for trying to overfit the model, they are not needed.
Can you post the training loop? While the model looks alright, much can go wrong in the training part.
You don’t say how big your dataset is, and it’s difficult to judge its “nature”. Anecdotally, I’ve noticed that text classifier take a while to learn before showing improvements in accuracy. But yeah, 30 epochs should probably do to some improvement.
I assume you do proper text preprocessing such lowercasing, removing quotation marks, and ignore rare words (not stopwords!) to minimize the vocabulary.
You probably have noticed, right now you cannot simply set bidirectional=True or n_layers>1 since it messes up with your input for the linear layer at the end. Just a comment for the future.

Colospring_Liu · November 26, 2020, 11:26pm

Thank you very much for your reply. I am experimenting with a small dataset with 5376 training data and 3024 validation data. Here are the codes of the training parts:

#### Training process
def train(model, iterator, optimizer, criterion):
    
    epoch_loss = 0
    epoch_acc = 0
    
    model.train()
    
    for batch in iterator:
        
        optimizer.zero_grad()
        
        text, text_lengths = batch.t
        
        if len(text_lengths.numpy())==BATCH_SIZE:
        
            predictions = model(text, text_lengths).squeeze(1)
        
            loss = criterion(predictions, batch.l)
        
            acc = accuracy(predictions, batch.l)
        
            loss.backward()
        
            optimizer.step()
        
            epoch_loss += loss.item()
            epoch_acc += acc.item()
        
    return epoch_loss / len(iterator), epoch_acc / len(iterator)

#### Evaluation process
def evaluate(model, iterator, criterion):
    
    epoch_loss = 0
    epoch_acc = 0
    
    model.eval()
    
    with torch.no_grad():
    
        for batch in iterator:

            text, text_lengths = batch.t
            if len(text_lengths.numpy())==BATCH_SIZE:
            
                predictions = model(text, text_lengths).squeeze(1)
            
                loss = criterion(predictions, batch.l)
            
                acc = accuracy(predictions, batch.l)

                epoch_loss += loss.item()
                epoch_acc += acc.item()
        
    return epoch_loss / len(iterator), epoch_acc / len(iterator)

I added if len(text_lengths.numpy())==BATCH_SIZE: because the validation data size is not divisible by the batch size, and this fixed the problem. Not sure whether this causes other problems.
Then the training loop is as below:

modelOutputName = 'LSTM-rnn.pt'

N_EPOCHS = 30

best_valid_loss = float('inf')

for epoch in range(N_EPOCHS):

    start_time = time.time()
    
    train_loss, train_acc = train(model_adv, train_iterator, optimizer, criterion)
    valid_loss, valid_acc = evaluate(model_adv, valid_iterator, criterion)
    
    end_time = time.time()

    epoch_mins, epoch_secs = epoch_time(start_time, end_time)
    
    if valid_loss < best_valid_loss:
        best_valid_loss = valid_loss
        torch.save(model_adv.state_dict(), modelOutputName)
    
    print(f'Epoch: {epoch+1:02} | Epoch Time: {epoch_mins}m {epoch_secs}s')
    print(f'\tTrain Loss: {train_loss:.3f} | Train Acc: {train_acc*100:.2f}%')
    print(f'\t Val. Loss: {valid_loss:.3f} |  Val. Acc: {valid_acc*100:.2f}%')

Thank you for all the suggestions. Could you explain more on how to overfit the model? I am not quite sure how this works in LSTM model. Do I just comment out dropout and use a smaller sample as you suggested? Thank you.

asvskartheek · November 27, 2020, 9:54am

Overfit is when the model parameters are tuned to train the dataset excessively without generalizing over the validation set. You just have to keep training for more epochs without concern for validation loss, if the training loss goes to zero. Then you can say that your model has overfitted to the train dataset.

Colospring_Liu · November 28, 2020, 7:58am

Thank you very much for your explanation! I have fixed the problem now!