If I understand it correctly, the Tensor size for an RNN is:
(batch_size x sequence_length x n_features)
batch_size: total segments to handle
sequence_length: number of time steps to unroll
n_features: dimension of a one-hot-encoded vector of the vocab size.
I’ve got a dataset with 73461 total chars and a vocab size of 52. I want to pass in a sequence length of 100 and predict the 101st character. That would make my tensor ([73361 x 100 x 52])
I believe that my final layer will be a fully connected layer that will be ([hidden_dim x output_size]) which in this case will be 7336100 x 52.
However I get this error at the loss function (which I’m using CrossEntropy()):
ValueError: Expected input batch_size (7336100) to match target batch_size (73361).
Here are the shapes,
Model Architecture: RNN(
(rnn): RNN(52, 12, num_layers=2, batch_first=True)
(fc): Linear(in_features=12, out_features=52, bias=True)
)
DEBUG
INPUT: torch.Size([73361, 100, 52])
MODEL OUTPUT: torch.Size([7336100, 52])
TARGET: torch.Size([73361])
Below is the model architecture.
# THe model
class RNN(nn.Module):
def __init__(self, input_size, output_size, hidden_dim, n_layers):
super(RNN, self).__init__()
# Params
self.hidden_dim = hidden_dim
self.n_layers = n_layers
# The layers
# Takes (n_batches x seq_length x n_features)
self.rnn = nn.RNN(input_size, hidden_dim, n_layers, batch_first=True)
# same size as the hidden_dim
self.fc = nn.Linear(hidden_dim, output_size)
def forward(self, x):
batch_size = x.size(0)
# Initialize the hidden state first as a bunch of zeros.
hidden = self.init_hidden(batch_size)
# the outputs
out, hidden = self.rnn(x, hidden)
out = out.reshape(-1, self.hidden_dim)
out = self.fc(out)
return out, hidden
def init_hidden(self, batch_size):
hidden = torch.zeros(self.n_layers, batch_size, self.hidden_dim)
return hidden
I’m not sure how to shape my output to work.
Any help is greatly appreciated.