There are a few things where Keras has defaults that are different to PyTorch’s that you would want to aware of. Using the forum search function with Keras LSTM, you will find a number of threads on the subject, including the one below that might be a good starting point.
Thank you for your reply. I’m not sure whether my input size in each layer is correct or not. Suppose the size of the input data is (256,48). After passing through an embedding layer with 10 units, it will become (256,48,10). Now followed by an LSTM layer with 50 units, the size will be (256,48,50). However, the expected output shape should be (256,50), so I use lstm_out = lstm_out[:,-1,:] to adjust the size. Is this a reasonable adjustment? Thank you.
So one thing you need to do to get it to work is to pass batch_first to the LSTM instantiation if that is what you want.
While taking the last timestep (as you do with lstm_out[:, -1, :]) is certainly a common way to set up sequence-to-one problems (assuming your inputs are of the same length), I would not call it a “size adjustment”. It says that the LSTM should “memorize” the relevant information. Other ways are possible, in fast.ai’s (Howard and Ruder) ULMFiT, they recommend to concatenate the last timestep with the average and max over time (but then the linear would have 3 * hidden_size inputs, of course).