Linear on top of RNN, weight broadcasted or not?

ymeng · April 19, 2018, 9:00pm

If an RNN layer (e.g. nn.LSTM) outputs data in shape of (batch, seq, featureSz), with batch_first=True, and a following Linear layer is defined as Linear(featureSz, hiddenSz), I understand the output shape of the Linear layer would be (batch, seq, hiddenSz).

However, I wonder how it is implemented in details. Does the Linear layer flatten the input to (batch*seq, featureSz), feed it to the same weight, and reshape it back? Or does it broadcast the weight, apply a linear transformation at each time step independently?
I am asking because the latter method would consume more memory, then I would like to write my own loop over time steps instead of using nn.LSTM to process 3D data directly.