LSTM not learning. Accuracy is just fluctuating

Hello. I am working on a multi-channel LSTM classifier that uses sequences corresponding to different attributes of an object in order to classify it. E.g. if I have a drone’s x-position, y-position and z-position, I generate three LSTM channels to operate on each attribute’s sequence and then concatenate the LSTM outputs and feed them into a fully connected layer. I have shown my code below. However, my model accuracy is not improving and seems to be jumping about in the range 50% to 70%. As as newbie to machine learning in general, I was wondering if there’s something I’m doing wrong in the code.

class MultipleAttributeClassifier(nn.Module):    
    def __init__(
        self,
        num_classes: int,
        num_attributes: int,
        input_size: int,
        hidden_size: int,
        num_lstm_layers: int,
        dropout_ratio: int=0.5,
    ) -> None:
        
        super().__init__()

        self.num_classes = num_classes
        
        # Layers
        # Empty list to hold LSTM chains for each attribute
        self.lstms = nn.ModuleList()
        self._create_lstms_for_attributes(
            num_attributes=num_attributes,
            input_size=input_size,
            hidden_size=hidden_size,
            num_lstm_layers=num_lstm_layers,
            dropout_ratio=dropout_ratio
        )
        
        self.fc1 = nn.Linear(hidden_size * num_attributes, 64)
        self.fc2 = nn.Linear(64, num_classes)
        self.dropout = nn.Dropout(dropout_ratio)

    def _create_lstms_for_attributes(
        self,
        num_attributes,
        input_size,
        hidden_size,
        num_lstm_layers,
        dropout_ratio
    ):
        for _ in range(num_attributes):
            lstm = nn.LSTM(
                input_size=input_size,
                hidden_size=hidden_size,
                num_layers=num_lstm_layers,
                batch_first=True, # Warning: Influences required input shape
                dropout=dropout_ratio
            )

            self.lstms.append(lstm)

    def forward(self, x):
        # x has shape (batch_size, sequence_length, num_attributes, input_size)
        out = []
        
        # Run each attribute sequence through LSTM (chain) for the specific attribute
        for attribute_id in range(x.shape[2]):
            batched_attribute_seq = x[:, :, attribute_id, :]   
            
            lstm_output, (hidden, cell) = self.lstms[attribute_id](batched_attribute_seq)

            # Get hidden state of last layer
            hidden = hidden[-1,:,:]
            out.append(hidden)

        # Concatenate hidden states for each attribute along hidden size dimension
        out = torch.cat(out, dim=-1)
        
        out = self.fc1(out)
        out = self.dropout(out)
        out = self.fc2(out)

        return out

seems your dropout ratio is too high (and it is actually a ‘float’, not ‘int’) try to decrease it to 0.1-0.2

Maybe your Learning Rate is too high? It’s impossible to say though. You’d need to build some hypotheses around why this might be happening and then validate them.

Thank you for the responses. All right. I guess I’ll focus on hyperparameter tuning a bit more starting with the dropout and the learning rate. My biggest concern was the code, specifically from the line hidden = hidden[-1,:,:] going forward. Wasn’t sure if that was correct.

To add more context anyway, my hyperparameters etc. are as follows:

NUM_EPOCHS=20
NUM_CLASSES=6
NUM_ATTRIBUTES=6
INPUT_SIZE=1
HIDDEN_SIZE=64
NUM_LSTM_LAYERS=2
DROPOUT_RATIO=0.5
LEARNING_RATE=0.001
OPTIMISER="Adam"
LOSS="Cross Entropy"

You were right, it was the learning rate. Thought I’d set it to 0.001 but it was actually 0.01. Thank so much.

1 Like