Hi there.

I have implemented a sequential neural network that has an MLP feature extractor prior to the recurrent blocks. Intuitively I found out that the torch.nn.Linear can handle data of shapes (L, B, N)

where L is the time-series length, B is the batch size and N is the feature shape.

For instance the following block works:

```
rand_tens = torch.randn(12, 4, 16)
linear_layer = torch.nn.Linear(16, 8)
output = linear_layer(rand_tens)
```

and its output is torch.Size([12, 4, 8]).

However, normally we would require flattening of the temporal dimension and the batch dimension, allowing for the Linear layer to operate at a shape (L*B, N). My questions are the following:

- Is this flattening (and unflattening) what torch.nn.Linear does under the hood?
- Is my thought process for processing (L, B, N) shaped data at the feature dimension with linear layers correct?

If you could point me a reference for that or the part of the documentation that describes it I would be greatful.

Thank you!