I want to build a model, that predicts next character based on the previous characters.
I have spliced text into sequences of integers with length = 100(using dataset and dataloader).
I want to predict character at each timestep.
Dimensions of my input and target variables are:
inputs dimension: (batch_size,sequence length). In my case (128,100) targets dimension: (batch_size,sequence length). In my case (128,100)
After forward pass I get dimension of my predictions: (batch_size, sequence_length, vocabulary_size) which is in my case (128,100,44)
but when I calculate my loss using
batch_size = 128 sequence_length = 100 number_of_classes = 44 # creates random tensor of your output shape output = torch.rand(batch_size,sequence_length, number_of_classes) # creates tensor with random targets target = torch.randint(number_of_classes, (batch_size,sequence_length)).long() # define loss function and calculate loss criterion = nn.CrossEntropyLoss() loss = criterion(output, target) print(loss)
I get an error:
ValueError: Expected target size (128, 44), got torch.Size([128, 100])
Question is: how should I handle calculation of the loss function for many-to-many LSTM prediction? I want to predict a character at each timestep. According to nn.CrossEntropyLoss Dimension must be(N,C,d1,d2…dN), where N is batch_size,C - number of classes.