I’m trying to train a character level RNN with a difference that each token(character) gets classified into one of m classes (multi class classification). My Input is of the shape (batch_size, seq_len, num_classes) because I’m using OHEncoding and output is of shape (batch_size, seq_len,num_classes) with softmax applied to dim 2. Here’s my model
def __init__(self, seq_len, input_size, hidden_size, output_size):
self.rnn = nn.RNN(input_size, hidden_size, batch_first=True)
self.op = nn.Linear(hidden_size, output_size)
self.softmax = nn.LogSoftmax(dim=2)
def forward(self, x):
x, h = self.rnn(x)
but when I use
NLLLoss() as the criterion and
batch_size=32, seq_len=101, num_classes=32, I get the following error
ValueError: Expected target size (32, 32), got torch.Size([32, 101, 32])
Please help if my understanding is correct or am I making a logical mistake somewhere. Thanks.
x, h = self.rnn(x)`
x has the shape
(seq_len, batch, hidden_size), in your case
(101, 32, hidden_size), since the output contains the hidden states for ALL 101 time steps.
The most obvious solution would to to use only the last hidden state. So you can either do
In your case, both
h[-1] will have the shape
(batch, hidden_size), in your case
The output of the
self.op layer will be
(batch_size, num_classes), in your case
(32, 32), exactly what you want.
The last change you need to do is to change
dim=1 in your definition of the Softmax layer.
thanks for your response. If I only use the last output or hidden layer from the RNN I’m only getting one prediction for one sequence. Whereas I want predictions for each character in the sequence (101 seq_length). Also to clarify the output shape I mentioned is actually the shape of the labels. So my input and labels are of same shape
(batch_size, seq_len, num_classes).
You might want to look a this post, it seems very related. The link Udacity tutorial is also exactly about a character RNN.
@vdw, The suggestion worked
I used the Udacity’s Notebook to change the output and final activations in my model. The outputs were changed from
(batch_size, seq_len, num_classes) to
(batch_size, seq_len), after I removed OHEncoding for the output. During the training time, the raw logit outputs were used in the loss function and in the inference stage I applied Softmax to change the model outputs to probabilities. I guess that’s the only way to do it.