Hi there, I am new to pytorch and I am trying to use an LSTM network to predict lane following - changing behaviors for autonomous driving. I am using data from the NGSIM database and I have 3 classes which I have encoded as one-hot vectors. I keep getting all my predictions on the same class and I think that something is fundamentally wrong with my code. Any suggestions would be greatly appreciated. Thank you
Here is a part of my code (based on code I found on the internet):
With out seeing the data, it’s difficult to say. Here are some questions just some comments that might help:
Is the loss going down at all? That’s actually the first thing to check.
You don’t really use batches but the you training data at once in each epoch. Note that the batch size also affects which learning rate is most suitable.
optimiser. zero_grad(): not sure if the whitespace is just a typo here or Python actually cares
There seems to be some views() they shouldn’t be needed, I think. Form example, in lstm_out[-1].view(self.batch_size, -1), is the view() really needed? lstm_out should have the shape (seq_len, batch_size, hidden_dim), so taking the last step should have the correct shape of (batch_size, hidden_dim) for the linear layer.
The values of y_train don’t seem to come from data. I don’t know that NGSIM dataset. Are there really just 3,000 items and the first 1,000 are of class 1, the second 1,000 of class 2, and the last 1,000 of class 3. I’m not saying that’s wrong, it just looks odd given that I don’t know the data.
Try using 1 layer for the LSTM first. 3,000 data items are not much for training, particularly when the network is complex.
What does lstm_input_size = 1 mean? In this case you sequence length is 1 and you wouldn’t need a RNN layer at all. Again, together with the X_train.view([input_size, -1, 1]), I feel that something with your data is probably off, not with the model.
+1 to everything Chris says above. A few other thoughts.
h1 = 32: you say only 3 classes but your output is 32. I’ve seen this in other models also. I don’t understand the use of these extra outputs.
MSELoss: I could be wrong about this so feel free to correct me. I thought MSELoss was more appropriate for true value prediction vs a classification task. I would use crossentropy as it includes a softmax. The LSTM module utilizes a tanh activation fuction so I “think” you can end up with negative activations which may cause problems with the summation within the MSELoss. As I said I could be wrong about this so anyone feel free to correct me. Something I wanted to research further.
h1 = 32 is the size hidden state which is independent from the output. Hence the last linear layer that maps the hidden state to the output (self.linear = nn.Linear(self.hidden_dim, output_dim)). That’s the normal setup. You could also have a series of linear layers (with activation functions and optional dropout etc.) as long as the last one maps to the output classes.
Good point regarding the loss function though. I’m more of an beginner myself, so I’m not sure either. I usually go with log_softmax as the last step in the forward method and than NLLLoss to compute the loss.
Did you solve this problem? Is that code from Jessica Jung? My prediction just is a straight line when I plot them. I try this code with household consuming electricity dataset. Do you have any way to fix this problem?
Thank you so much.
I’ve just spent a few days struggling with a similar issue. It seems that the variance in my data was too small. I was able to solve the issue by running the data through a BatchNorm layer prior to feeding it to my LSTM.