In language modelling, when using RNNs the output (lets denote it by **h**) of your RNN is almost always a tensor with dimensions:

`h.shape = [time, batch_size, hidden_size]`

And from here, a common practice is to use a “decoding” linear layer:

`decoder = nn.Linear(hidden_size, vocab_size)`

to obtain logits with dimensions:

`logits.shape = [time, batch_size, vocab_size]`

Now, I have seen people doing this in two ways. One:

`logits = decoder(h)`

where this boils down to a matrix multiplication of the 3D tensor **h** with the 2D tensor inside the decoder.

Two:

`logits = decoder(h.view(time * batch_size, hidden_size))`

i.e. they first transform the 3D to a 2D tensor and then they pass it to the decoder, which boils down to a matrix multiplcation between two 2D tensors. Then based on what format do we want the **logits** to be in, we can reshape with:

`logits = logits.view(time, batch_size, vocab_size)`

So, finally my questions:

- Are the two approaches identical?
- If not why?
- If yes, is there a best practice on which to use and why?