How to change GRU to LSTM in Chatbot Tutorial

StephKua · November 24, 2018, 11:35am

Hi guys,

I’m trying to use LSTM in the Chatbot tutorial provided by Pytorch. However, I’m currently facing an error shown in below.

RuntimeError: Expected hidden[0] size (4, 64, 500), got (2, 64, 500)

Any help would be much appreciated.

Thank you and have a nice day.

ptrblck · November 24, 2018, 10:05pm

The shape of hidden should not change swapping the LSTM for GRU.
Could you post your code here so that we could have a look?

StephKua · November 24, 2018, 10:41pm

Thank you for your reply.

self.gru = nn.GRU(hidden_size, hidden_size, n_layers, dropout=(0 if n_layers == 1 else dropout), bidirectional=True)

original.

self.lstm = nn.LSTM(hidden_size, hidden_size, n_layers, dropout=(0 if n_layers == 1 else dropout), bidirectional=True)

swapped to LSTM.

The only thing that I change was nn.GRU to nn.LSTM for both EncoderRNN and LuongAttnDecoderRNN.

Original Code Link: https://pytorch.org/tutorials/beginner/chatbot_tutorial.html

ptrblck · November 24, 2018, 11:32pm

Thanks for the clarification.
I guess you are just changing the model without modifying the forward pass.
The error might be thrown, if you forget to pass a state tensor to the LSTM.
Have a look at the docs.
Since hidden and state are assumed to be passed as a tuple, hidden will be sliced in case you pass it without state, which yields this size mismatch error.

Dean_Sumner · January 1, 2020, 1:57pm

Aside from the forward pass modification mentioned above, I believe it is also necessary to change the part of the code in the training function. I.e, instead of passing the last encoder_hidden, make sure you are also passing the last (encoder_ h_hidden, encoder_c_hidden) to the decoder.

vdw · January 2, 2020, 7:01am

Can you specify where this error is thrown, i.e., in which line of your code or at least whether in the encoder or decoder. When going through the tutorial, I cannot see why switching from GRU to LSTM should cause any troubles. I usually write my models in such a why that the choice of cell is configurable, and there are not many cases to consider.

The hidden state of both GRU and LSTM has a shape of (num_layers * num_directions, batch, hidden_size) – LSTM has also a cell state with the same shape. Looking at your error, the first dimension (i.e., num_layers * num_directions). Without any more details, I would guess that something with the values for num_layers and/or num_directions might be off.

Dean_Sumner · January 13, 2020, 10:57am

My recommendations need to be used both in the training function and in the evaluation function when setting the decode initial state.

Example code:

 # Set initial decoder hidden state to the encoder's final hidden state
if encoder.model_type == 'GRU':
    decoder_hidden = encoder_hidden[:decoder.n_layers]
    
elif encoder.model_type == 'LSTM':
    #Get the encoder final h_hidden_state
    encoder_h_hidden, encoder_c_hidden= encoder_hidden
    decoder_h_hidden = encoder_h_hidden[:decoder.n_layers]
    decoder_c_hidden = encoder_c_hidden[:decoder.n_layers]
    # Recombine the final hidden states as hidden tuple
    decoder_hidden = (decoder_h_hidden, decoder_c_hidden)

Sandip_More · April 24, 2020, 5:28am

Thanks It worked for me .just to add: put below code in train function and evaluate fubction

Set initial decoder hidden state to the encoder’s final hidden state

#Get the encoder final h_hidden_state
encoder_h_hidden, encoder_c_hidden= encoder_hidden
decoder_h_hidden = encoder_h_hidden[:decoder.n_layers]
decoder_c_hidden = encoder_c_hidden[:decoder.n_layers]

Recombine the final hidden states as hidden tuple

decoder_hidden = (decoder_h_hidden, decoder_c_hidden)

rionel_dmello · February 21, 2021, 2:52am

Hi @ptrblck Can you tell me how to add a state vector. The code works as is if I do what @Dean_Sumner said but I feel like I am missing something because I am not passing the state vector. What I am trying to ask is “Is it okay to do only what @Dean_Sumner said or is there something else I have to do too?”. Thank you in advance.

ptrblck · February 22, 2021, 7:20am

I’m not completely sure which code you are referring to.
In this post @Dean_Sumner creates the hidden and cell states.

rionel_dmello · February 22, 2021, 8:03pm

@ptrblck Yeah I tried it and it works. Thank you so much. I was referring to Chatbot Tutorial — PyTorch Tutorials 1.7.1 documentation