Why would batch size not match?

What exactly is your task? A classification task?

If so, note that out contains the last hidden states (“last” w.r.t. the number of layers) for all time steps. For a classification task, you typically just use the hidden state of the last time step.

Additionally, you might want to have a look at this post to avoid any issues when using view().