LSTM network predicting the same class for all training examples

Hi there, I am new to pytorch and I am trying to use an LSTM network to predict lane following - changing behaviors for autonomous driving. I am using data from the NGSIM database and I have 3 classes which I have encoded as one-hot vectors. I keep getting all my predictions on the same class and I think that something is fundamentally wrong with my code. Any suggestions would be greatly appreciated. Thank you

Here is a part of my code (based on code I found on the internet):

num_train = 3000
h1 = 32
output_dim = 3
num_layers = 5
learning_rate = 1e-3
num_epochs = 30

per_element = True
if per_element:
    lstm_input_size = 1
else:
    lstm_input_size = input_size

X_train = torch.from_numpy(xtrain).type(torch.Tensor)
X_train = X_train.view([input_size, -1, 1])

#Arrange labels as one-hot vectors
ytrain = np.zeros([3000, 3])
for i in range(3000):
	if i < 1000:
		ytrain[i,0] = 1
	elif i < 2000:
		ytrain[i,1] = 1
	elif i < 3000:
		ytrain[i,2] = 1

y_train = torch.from_numpy(ytrain).type(torch.Tensor).view(-1)

class LSTM(nn.Module):

    def __init__(self, input_dim, hidden_dim, batch_size, output_dim, num_layers):
        super(LSTM, self).__init__()
        self.input_dim = input_dim
        self.hidden_dim = hidden_dim
        self.batch_size = batch_size
        self.num_layers = num_layers

        # Define the LSTM layer
        self.lstm = nn.LSTM(self.input_dim, self.hidden_dim, self.num_layers)

        # Define the output layer
        self.linear = nn.Linear(self.hidden_dim, output_dim)

    def init_hidden(self):
        return (torch.zeros(self.num_layers, self.batch_size, self.hidden_dim),
                torch.zeros(self.num_layers, self.batch_size, self.hidden_dim))

    def forward(self, input):
        lstm_out, self.hidden = self.lstm(input.view(len(input), self.batch_size, -1))
        
        y_pred = self.linear(lstm_out[-1].view(self.batch_size, -1))
        return y_pred.view(-1)

model = LSTM(lstm_input_size, h1, batch_size=num_train, output_dim=output_dim, num_layers=num_layers)

loss_fn = nn.MSELoss()
optimiser = torch.optim.Adam(model.parameters(), lr=learning_rate)

for t in range(num_epochs):		
	model.hidden = model.init_hidden()

	y_pred = model(X_train)

	loss = loss_fn(y_train, y_pred)

	print("Epoch ", t, "\nMSE: ", loss.item())

	hist[t] = loss.item()

	optimiser. zero_grad()

	loss.backward()

	optimiser.step()

With out seeing the data, it’s difficult to say. Here are some questions just some comments that might help:

  • Is the loss going down at all? That’s actually the first thing to check.
  • You don’t really use batches but the you training data at once in each epoch. Note that the batch size also affects which learning rate is most suitable.
  • optimiser. zero_grad(): not sure if the whitespace is just a typo here or Python actually cares :slight_smile:
  • There seems to be some views() they shouldn’t be needed, I think. Form example, in lstm_out[-1].view(self.batch_size, -1), is the view() really needed? lstm_out should have the shape (seq_len, batch_size, hidden_dim), so taking the last step should have the correct shape of (batch_size, hidden_dim) for the linear layer.
  • The values of y_train don’t seem to come from data. I don’t know that NGSIM dataset. Are there really just 3,000 items and the first 1,000 are of class 1, the second 1,000 of class 2, and the last 1,000 of class 3. I’m not saying that’s wrong, it just looks odd given that I don’t know the data.
  • Try using 1 layer for the LSTM first. 3,000 data items are not much for training, particularly when the network is complex.
  • What does lstm_input_size = 1 mean? In this case you sequence length is 1 and you wouldn’t need a RNN layer at all. Again, together with the X_train.view([input_size, -1, 1]), I feel that something with your data is probably off, not with the model.
2 Likes

+1 to everything Chris says above. A few other thoughts.

  • h1 = 32: you say only 3 classes but your output is 32. I’ve seen this in other models also. I don’t understand the use of these extra outputs.
  • MSELoss: I could be wrong about this so feel free to correct me. I thought MSELoss was more appropriate for true value prediction vs a classification task. I would use crossentropy as it includes a softmax. The LSTM module utilizes a tanh activation fuction so I “think” you can end up with negative activations which may cause problems with the summation within the MSELoss. As I said I could be wrong about this so anyone feel free to correct me. Something I wanted to research further.

h1 = 32 is the size hidden state which is independent from the output. Hence the last linear layer that maps the hidden state to the output (self.linear = nn.Linear(self.hidden_dim, output_dim)). That’s the normal setup. You could also have a series of linear layers (with activation functions and optional dropout etc.) as long as the last one maps to the output classes.

Good point regarding the loss function though. I’m more of an beginner myself, so I’m not sure either. I usually go with log_softmax as the last step in the forward method and than NLLLoss to compute the loss.

Ah yes, great point on the last linear layer as output. I missed that. Now h1 = 32 makes sense.

Did you solve this problem? Is that code from Jessica Jung? My prediction just is a straight line when I plot them. I try this code with household consuming electricity dataset. Do you have any way to fix this problem?
Thank you so much.

@anna2712 were you able to find a solution? I’m having the same problem as you are

I’ve just spent a few days struggling with a similar issue. It seems that the variance in my data was too small. I was able to solve the issue by running the data through a BatchNorm layer prior to feeding it to my LSTM.