Proper way of setting h,c in bidirectional RNN Layers

Hello everybody,

I’m having an issue understanding bidirectional RNN layers when non-zero hidden variable initialization is concerned (i.e. when h and c come from e.g. other layers). For example in this case

from torch import nn

input_size = 10
hidden_size =20
n_layers = 4
batch_size = 5
seq_len = 40

b_lstm = nn.LSTM(

# variables
# input
input = torch.randn(batch_size, seq_len, input_size)

# forward input hidden variables
h0 = ... # torch.randn(n_layers, batch_size, hidden_size)
c0 = ... # torch.randn(n_layers, batch_size, hidden_size)

# backwards input hidden variables
h1 = ... # torch.randn(n_layers, batch_size, hidden_size)
c1 = ... # torch.randn(n_layers, batch_size, hidden_size)

The way that seems more practical is simply stack them along the furthest dimension

# first way of stacking
h =[h0, h1], dim=0)
c =[c0, c1], dim=0)

# first stacking execution
output, (hn, cn) = b_lstm(input, (h, c))

However, when comparing hn and cn for simple geometries, it appears that hn is staggered in the following way hn = [h00, h01, h10, h11, h20, h21, ... hn0, hn1] and not, as I would expect hn = [h00, h10, h20, ...hn0, h01, h11, h21, ..., hn1], hence this other way of combining variables:

# second way of stacking
h_ =[h0.reshape(-1, hidden_size * batch_size),
                h1.reshape(-1, hidden_size * batch_size)],
               dim=1).reshape(-1, batch_size, hidden_size)
c_ =[c0.reshape(-1, hidden_size * batch_size),
                c1.reshape(-1, hidden_size * batch_size)],
               dim=1).reshape(-1, batch_size, hidden_size)

# second stacking exection
output_, (hn_, cn_) = rnn(input, (h_, c_))

Does anybody have experience with this matter?