Could someone please explain me. Why the shape of hidden in GRU or LSTM is not same output shape? When we set batch_first is True

I am so confusing. I try to put my data to GRU/LSTM model with batch_first is true.
I reshape my data to be of shape (batch,seq, features). I expect the model will give the output have the same shape. Yes, the output of the model has the same shape (batch,seq, features), but I have no idea. Why last hidden output still to be of shape (batch,seq, features). Could you please someone explain to me?
This is my code.

The first dimension isn’t sequence length but num layers * num directions (2 for bidirectional else 1). The documentation for nn.GRU is a bit long and the output and shape subsections which both describe the output shape are towards the end.

Best regards

Thomas

P.S.: Don’t use Variable, sources that still use that have not been updated for a year.
P.P.S.: If you use triple backticks (```) before and after your code, it formats nicely as text, too. That makes it much easier to read than images are.