How to keep hidden state of the last layer of packed GRU

Hi everybody,

I am playing with seq2seq for NMT and I was trying to add several layers to my working GRU model. Unfortunately, I see that the hidden state vectors dimensions is impacted by the number of layers.

If I want to keep only the last GRU layer’s hidden state, I need to truncate my h_nn tensor… but I am lost how to do it … :face_with_hand_over_mouth:

Or how to how to concatenate the multiple layers to get a final hidden state vectors with the same shape as I was using a monolayer GRU?

Thanks in advance

Best regards

Jerome

def forward(self, x_source, x_lengths):
“”"
Forward pass
:param x_source: input batch of sequences
:param x_lengths: length of each sequence
:return: x_unpacked, x_birnn_h
“”"

    # apply the embedding on the input sequences
    x_embedded = self._source_embedding(x_source)

    # create the packed sequences structure
    x_packed = pack_padded_sequence(
        x_embedded, 
        x_lengths.detach().cpu().numpy(),
        batch_first=True
    )

    # apply the rnn
    x_birnn_out, x_birnn_h = self._birnn(x_packed)
    if self._num_layers>1:
        x_birnn_h = ## TO DO : keep hiddzn for last layer only...

    # permute the dimensions of the hidden state and flatten it
    x_birnn_h = x_birnn_h.permute(1, 0, 2)
    x_birnn_h = x_birnn_h.contiguous().view(x_birnn_h.size(0), -1)

    # unpacked the sequences before to return them
    x_unpacked, _ = pad_packed_sequence(x_birnn_out, batch_first=True)

    return x_unpacked, x_birnn_h

According to the docs, the shape of x_birnn_h is (num_layers * num_directions, batch, hidden_size). A common way to to work with the last hidden states is:

output, hidden = self._birnn(x)
# hidden.shape = (n_layers * n_directions, batch_size, hidden_dim)
hidden = hidden.view(n_layers, n_directions, batch_size, hidden_dim)
# This view() comes directly from the PyTorch docs
# hidden.shape = (n_layers, n_directions, batch_size, hidden_dim)
hidden = hidden[-1]
# hidden.shape = (n_directions, batch_size, hidden_dim)
hidden_forward, hidden_backward = hidden[0], hidden[1]
# Both shapes (batch_size, hidden_dim)
hidden = torch.cat((hidden_forward, hidden_backward), dim=1)

Edit: If you want the last hidden have the same shape as the unidirectional GRU, you can also add hidden_forward and hidden_backward (instead of concatinating). I’ve seen it done, but don’t really know what’s the best practice here.

1 Like

Thanks Chris, it helps a lot.