If I understand it correctly, the Tensor size for an RNN is:
(batch_size x sequence_length x n_features)
batch_size: total segments to handle
sequence_length: number of time steps to unroll
n_features: dimension of a one-hot-encoded vector of the vocab size.
I’ve got a dataset with 73461 total chars and a vocab size of 52. I want to pass in a sequence length of 100 and predict the 101st character. That would make my tensor ([73361 x 100 x 52])
I believe that my final layer will be a fully connected layer that will be ([hidden_dim x output_size]) which in this case will be 7336100 x 52.
However I get this error at the loss function (which I’m using CrossEntropy()):
ValueError: Expected input batch_size (7336100) to match target batch_size (73361).
Here are the shapes,
Model Architecture: RNN( (rnn): RNN(52, 12, num_layers=2, batch_first=True) (fc): Linear(in_features=12, out_features=52, bias=True) ) DEBUG INPUT: torch.Size([73361, 100, 52]) MODEL OUTPUT: torch.Size([7336100, 52]) TARGET: torch.Size()
Below is the model architecture.
# THe model class RNN(nn.Module): def __init__(self, input_size, output_size, hidden_dim, n_layers): super(RNN, self).__init__() # Params self.hidden_dim = hidden_dim self.n_layers = n_layers # The layers # Takes (n_batches x seq_length x n_features) self.rnn = nn.RNN(input_size, hidden_dim, n_layers, batch_first=True) # same size as the hidden_dim self.fc = nn.Linear(hidden_dim, output_size) def forward(self, x): batch_size = x.size(0) # Initialize the hidden state first as a bunch of zeros. hidden = self.init_hidden(batch_size) # the outputs out, hidden = self.rnn(x, hidden) out = out.reshape(-1, self.hidden_dim) out = self.fc(out) return out, hidden def init_hidden(self, batch_size): hidden = torch.zeros(self.n_layers, batch_size, self.hidden_dim) return hidden
I’m not sure how to shape my output to work.
Any help is greatly appreciated.