Missing or conflicting documentations between versions?

I don’t use Pytorch as often as I should, so I always need to consult the documentation. And now I came across an issue that was well documented in the previous versions but at least not in the current one (as far as I can tell).

The question is about handling the last hidden state h_n for an nn.LSTM layer (but same with nn.GRU). The issue is that the one dimension is the product of num_layers and num_directions. The documentation for Pytorch version 1.0.0 is pretty clear:

h_n of shape (num_layers*num_directions, batch, hidden_size): tensor containing the hidden state for t=seq_len. Like output, the layers can be separated using h_n.view(num_layers, num_directions, batch, hidden_size) and similarly for c_n.

It gives a concrete example of how to separate the num_layer and num_directions dimensions. And this is what I always used in my implementations. However, the documentation for Pytorch version 1.1.3 reads as follows:

h_n: tensor of shape (D*num_layers, H_out) for unbatched input or (D*num_layers, N, H_out) containing the final hidden state for each element in the sequence. When bidirectional=True, h_n will contain a concatenation of the final forward and reverse hidden states, respectively.

While mapping the names (D=num_directions, N=batch, H_out=hidden) is straightforward, the documentation is now missing the way to split D and num_layers. It’s tempting to adopt the old method:

h_n.view(num_layers, D, N, H_out)

but note that here the order of D and num_layers is now flipped: (D*num_layers, …) vs (num_layers*num_directions). Does this mean I now have to do:

h_n.view(D, num_layers, N, H_out)

I’m pretty sure the order does matter, but I derive for certain which version is the correct one. Or what am I missing here?