How to change GRU to LSTM in Chatbot Tutorial

Hi guys,

I’m trying to use LSTM in the Chatbot tutorial provided by Pytorch. However, I’m currently facing an error shown in below.

RuntimeError: Expected hidden[0] size (4, 64, 500), got (2, 64, 500)

Any help would be much appreciated.

Thank you and have a nice day.

The shape of hidden should not change swapping the LSTM for GRU.
Could you post your code here so that we could have a look?

1 Like

Thank you for your reply.

self.gru = nn.GRU(hidden_size, hidden_size, n_layers, dropout=(0 if n_layers == 1 else dropout), bidirectional=True)

original.

self.lstm = nn.LSTM(hidden_size, hidden_size, n_layers, dropout=(0 if n_layers == 1 else dropout), bidirectional=True)

swapped to LSTM.

The only thing that I change was nn.GRU to nn.LSTM for both EncoderRNN and LuongAttnDecoderRNN.

Original Code Link: https://pytorch.org/tutorials/beginner/chatbot_tutorial.html

Thanks for the clarification.
I guess you are just changing the model without modifying the forward pass.
The error might be thrown, if you forget to pass a state tensor to the LSTM.
Have a look at the docs.
Since hidden and state are assumed to be passed as a tuple, hidden will be sliced in case you pass it without state, which yields this size mismatch error.

1 Like

Aside from the forward pass modification mentioned above, I believe it is also necessary to change the part of the code in the training function. I.e, instead of passing the last encoder_hidden, make sure you are also passing the last (encoder_ h_hidden, encoder_c_hidden) to the decoder.

Can you specify where this error is thrown, i.e., in which line of your code or at least whether in the encoder or decoder. When going through the tutorial, I cannot see why switching from GRU to LSTM should cause any troubles. I usually write my models in such a why that the choice of cell is configurable, and there are not many cases to consider.

The hidden state of both GRU and LSTM has a shape of (num_layers * num_directions, batch, hidden_size) – LSTM has also a cell state with the same shape. Looking at your error, the first dimension (i.e., num_layers * num_directions). Without any more details, I would guess that something with the values for num_layers and/or num_directions might be off.

My recommendations need to be used both in the training function and in the evaluation function when setting the decode initial state.

Example code:

 # Set initial decoder hidden state to the encoder's final hidden state
if encoder.model_type == 'GRU':
    decoder_hidden = encoder_hidden[:decoder.n_layers]
    
elif encoder.model_type == 'LSTM':
    #Get the encoder final h_hidden_state
    encoder_h_hidden, encoder_c_hidden= encoder_hidden
    decoder_h_hidden = encoder_h_hidden[:decoder.n_layers]
    decoder_c_hidden = encoder_c_hidden[:decoder.n_layers]
    # Recombine the final hidden states as hidden tuple
    decoder_hidden = (decoder_h_hidden, decoder_c_hidden)
2 Likes

Thanks It worked for me .just to add: put below code in train function and evaluate fubction

Set initial decoder hidden state to the encoder’s final hidden state

#Get the encoder final h_hidden_state
encoder_h_hidden, encoder_c_hidden= encoder_hidden
decoder_h_hidden = encoder_h_hidden[:decoder.n_layers]
decoder_c_hidden = encoder_c_hidden[:decoder.n_layers]

Recombine the final hidden states as hidden tuple

decoder_hidden = (decoder_h_hidden, decoder_c_hidden)

Hi @ptrblck Can you tell me how to add a state vector. The code works as is if I do what @Dean_Sumner said but I feel like I am missing something because I am not passing the state vector. What I am trying to ask is “Is it okay to do only what @Dean_Sumner said or is there something else I have to do too?”. Thank you in advance.

I’m not completely sure which code you are referring to.
In this post @Dean_Sumner creates the hidden and cell states.

@ptrblck Yeah I tried it and it works. Thank you so much. I was referring to Chatbot Tutorial — PyTorch Tutorials 1.7.1 documentation