LSTM doesn't train

Hello everybody,

I learned Keras and now i will learn PyTorch, I am a beginner. I tried to use a LSTM (both in keras and PyTorch), and the one of PyTorch doesn’t train. I know approximately how the loss and the accuracy must be with Keras, and here, they doesn’t change during the epoch. So i did the assumption that my PyTorch code is not good. I juste want to use one LSTM layer with 256 filters and one linear layer.
This is my PyTorch code :

I want to do some classifications, I have 3 classes, and the data are 1D time series with 17908 values. In the train, there are 14001 time series. The shape of the matrix is 14001x17908.

Thanks for your help :smile: :grin:

Based on your description and this code:

X_valid=np.reshape(X_valid, (X_valid.shape[0], 1,X_valid.shape[1]))
X_train=np.reshape(X_train, (X_train.shape[0],1,X_train.shape[1]))  

as well as the usage of batch_first=True in the nn.LSTM module the input data should have a shape of [batch_size=14001, seq_len=1, nb_features=17908].
If that’s correct, note that you would only be using a single time step so that the nn.LSTM module might not be really useful.
Could you explain the approach you’ve used in Keras/TF, i.e. which shapes (in particular sequence lengths) were used there?

PS: unrelated to this issue, but it seems you would like to transform the numpy arrays to tensors here:

X_train, X_valid = [torch.tensor(arr, dtype=torch.float32) for arr in (X_train, X_valid)]
y_train, y_valid = [torch.tensor(arr, dtype=torch.long) for arr in (target_train, target_valid)]

If so, you could use:

X_train = torch.from_numpy(X_train).float()
X_valid = ...
y_train = torch.from_numpy(y_train).long()
y_valid = ...

thanks for your answer :grin: !!

The keras approch I used :

i think i use the same sequence lengths…

thanks !!

thanks for the advice for the tensors transformation :slight_smile:

In fact it is not really “time series” but it is probability density function (in log) so it is complicated to have several sequence lengths, i think…

Thanks for the update.
Based on the Keras LSTM docs it seems that the input should have the shape:

inputs: A 3D tensor with shape [batch, timesteps, feature].

The linked Keras implementation uses tensors as:

X_test=np.reshape(X_test, (X_test.shape[0],X_test.shape[1],1))
X_train=np.reshape(X_train, (X_train.shape[0],X_train.shape[1],1))	
X_validation=np.reshape(X_validation, (X_validation.shape[0],X_validation.shape[1],1))	

so it seems that the feature dimension is set to one and the temporal dimension is large.
If I’m not misunderstanding the Keras docs or the posted Keras code, I guess this would be the main difference between both codes.

Refer the docs of torch LSTM how the input data should be arranged. It is an excellent piece of documentation. Kind of felt like the input data is not arranged properly in the code. Definitely use the batch_first = True option in the lstm. Makes life a lot easier.

Docs : LSTM_Pytorch