Size error between RNN fully connected output and target

If I understand it correctly, the Tensor size for an RNN is:
(batch_size x sequence_length x n_features)

batch_size: total segments to handle
sequence_length: number of time steps to unroll
n_features: dimension of a one-hot-encoded vector of the vocab size.

I’ve got a dataset with 73461 total chars and a vocab size of 52. I want to pass in a sequence length of 100 and predict the 101st character. That would make my tensor ([73361 x 100 x 52])

I believe that my final layer will be a fully connected layer that will be ([hidden_dim x output_size]) which in this case will be 7336100 x 52.

However I get this error at the loss function (which I’m using CrossEntropy()):
ValueError: Expected input batch_size (7336100) to match target batch_size (73361).

Here are the shapes,

Model Architecture: RNN(
  (rnn): RNN(52, 12, num_layers=2, batch_first=True)
  (fc): Linear(in_features=12, out_features=52, bias=True)

 INPUT: torch.Size([73361, 100, 52])
MODEL OUTPUT: torch.Size([7336100, 52])
TARGET: torch.Size([73361])

Below is the model architecture.

# THe model
class RNN(nn.Module):

    def __init__(self, input_size, output_size, hidden_dim, n_layers):
        super(RNN, self).__init__()

        # Params
        self.hidden_dim = hidden_dim
        self.n_layers = n_layers

        # The layers
        # Takes (n_batches x seq_length x n_features)
        self.rnn = nn.RNN(input_size, hidden_dim, n_layers, batch_first=True)
        # same size as the hidden_dim
        self.fc = nn.Linear(hidden_dim, output_size)

    def forward(self, x):

        batch_size = x.size(0)

        # Initialize the hidden state first as a bunch of zeros.
        hidden = self.init_hidden(batch_size)

        # the outputs
        out, hidden = self.rnn(x, hidden)

        out = out.reshape(-1, self.hidden_dim)
        out = self.fc(out)

        return out, hidden

    def init_hidden(self, batch_size):
        hidden = torch.zeros(self.n_layers, batch_size, self.hidden_dim)

        return hidden

I’m not sure how to shape my output to work.

Any help is greatly appreciated.

This line of code increases the batch size, which would be wrong:

out = out.reshape(-1, self.hidden_dim)

Depending on your use case, you might want to pass the last time step to the linear layer via:

out = self.fc(out[:, -1, :])

instead or reduce the temporal dimension in any other way.