Implement tensorflow bi-directional code to pytorch

def encoder():
  enc = keras.Sequential()
  enc.add(Input(shape=(maxlen,), name='Encoder-Input'))
  enc.add(Embedding(num_words, embed_dim,input_length = maxlen, name='Body-Word-Embedding', mask_zero=False))
  enc.add(Bidirectional(LSTM(128, activation='relu', name='Encoder-Last-LSTM')))

  return enc

enter image description here

Here I get output of shape Batch x 256

In pytorch:

class EncoderRNN(nn.Module):
def __init__(self):
    super(EncoderRNN, self).__init__()
    self.embedding = nn.Embedding(vocab_size, embed_dim)
    self.lstm = nn.LSTM(embed_dim, embed_dim, 2, bidirectional=True, batch_first=False)
    self.relu = nn.ReLU()

def forward(self, input_, hidden, c):
    embedded = self.embedding(input_).transpose(0, 1)
    output, (hn, cn) = self.lstm(embedded, (hidden, c))
    output = self.relu(output)
    return out, hn, cn

def initHidden(self):
    return torch.zeros(2*2, 16, 128, device=device), torch.zeros(2*2, 16, 128, device=device)

But here I get output of shape: sequence length x batch x 256 and I want to get batch x 256

How to implement the exact code of tensorflow to pytorch?

output will contain the last hidden state (“last” w.r.t. the number of layers) for all time steps. You want to use hn which gives you the last hidden states (“last” w.r.t the number of time steps) for all layers. Check out this images below:

You might also check out the docs to make sure you handle the bidirectional setup correctly.

Thank you. I am also confused with the shape of last hidden state as it gives output of shape:

[ number of layers x batch x hidden_dim] → if I use 5 layers → [5, 64, 256].

Should it not be [batch, hidden_dim] as I want the latent representation of the input? What does 5 signify when I pass it to decoder?

You can simply check the docs: The output shape of hn is (D*num_layers, N, H_out​) where D is the number of directions (1 or 2), N is the batch size, and H_out is your hidden_dim.

Are yo sure you used bidirectional=True to get this result, because with num_layers=5, the shape of hn should be (10, 64, 256) with 10=5*2 reflecting the 5 layers and the 2 directions.

Yes sorry about the 5, I was talking in terms of general LSTM but yes if we use bi-directional LSTM it’s 10 with 5 number of layers. My question is what does this 10 tell us about when we pass it to decoder? Should the decoder also have to have 5 bi-directional layer?

It depends on your task. For an autoencoder, for example, you can set up your decoder the same way as your encoder ans simply copy (hn, cn) over.

If you are generating text sequences (e.g., for machine) translation. A bidirectional decoder doesn’t make sense as you generate the the output sequence step by step. In this case you have to see, how to best makes the hidden state from the decoder a valid hidden state of the decoder. The easiest would probably that the decoder LSTM has the same number of layers as the encoder LSTM – but with bidirectional=False and bring the encoder hidden state from (10, 64, 256) to (5, 64, 256) by adding the two directions. It really depends how your decoder looks like.

1 Like

thank you so much for your help