Implement tensorflow bi-directional code to pytorch

Bishwa_Karki · November 23, 2022, 1:07am

def encoder():
  enc = keras.Sequential()
  enc.add(Input(shape=(maxlen,), name='Encoder-Input'))
  enc.add(Embedding(num_words, embed_dim,input_length = maxlen, name='Body-Word-Embedding', mask_zero=False))
  enc.add(Bidirectional(LSTM(128, activation='relu', name='Encoder-Last-LSTM')))

  return enc

Here I get output of shape Batch x 256

In pytorch:

class EncoderRNN(nn.Module):
def __init__(self):
    super(EncoderRNN, self).__init__()
    self.embedding = nn.Embedding(vocab_size, embed_dim)
    self.lstm = nn.LSTM(embed_dim, embed_dim, 2, bidirectional=True, batch_first=False)
    self.relu = nn.ReLU()

def forward(self, input_, hidden, c):
    embedded = self.embedding(input_).transpose(0, 1)
    output, (hn, cn) = self.lstm(embedded, (hidden, c))
    output = self.relu(output)
    return out, hn, cn

def initHidden(self):
    return torch.zeros(2*2, 16, 128, device=device), torch.zeros(2*2, 16, 128, device=device)

But here I get output of shape: sequence length x batch x 256 and I want to get batch x 256

How to implement the exact code of tensorflow to pytorch?

vdw · November 23, 2022, 4:52am

output will contain the last hidden state (“last” w.r.t. the number of layers) for all time steps. You want to use hn which gives you the last hidden states (“last” w.r.t the number of time steps) for all layers. Check out this images below:

You might also check out the docs to make sure you handle the bidirectional setup correctly.

Bishwa_Karki · November 28, 2022, 5:04pm

Thank you. I am also confused with the shape of last hidden state as it gives output of shape:

[ number of layers x batch x hidden_dim] → if I use 5 layers → [5, 64, 256].

Should it not be [batch, hidden_dim] as I want the latent representation of the input? What does 5 signify when I pass it to decoder?

vdw · November 29, 2022, 12:39am

You can simply check the docs: The output shape of hn is (D*num_layers, N, H_out) where D is the number of directions (1 or 2), N is the batch size, and H_out is your hidden_dim.

Are yo sure you used bidirectional=True to get this result, because with num_layers=5, the shape of hn should be (10, 64, 256) with 10=5*2 reflecting the 5 layers and the 2 directions.

Bishwa_Karki · November 29, 2022, 5:40am

Yes sorry about the 5, I was talking in terms of general LSTM but yes if we use bi-directional LSTM it’s 10 with 5 number of layers. My question is what does this 10 tell us about when we pass it to decoder? Should the decoder also have to have 5 bi-directional layer?

vdw · November 29, 2022, 7:24am

It depends on your task. For an autoencoder, for example, you can set up your decoder the same way as your encoder ans simply copy (hn, cn) over.

If you are generating text sequences (e.g., for machine) translation. A bidirectional decoder doesn’t make sense as you generate the the output sequence step by step. In this case you have to see, how to best makes the hidden state from the decoder a valid hidden state of the decoder. The easiest would probably that the decoder LSTM has the same number of layers as the encoder LSTM – but with bidirectional=False and bring the encoder hidden state from (10, 64, 256) to (5, 64, 256) by adding the two directions. It really depends how your decoder looks like.

Bishwa_Karki · November 29, 2022, 7:29am

thank you so much for your help