Unsure of output dimension and Loss type (Newbie)

I’m trying to train a character level RNN with a difference that each token(character) gets classified into one of m classes (multi class classification). My Input is of the shape (batch_size, seq_len, num_classes) because I’m using OHEncoding and output is of shape (batch_size, seq_len,num_classes) with softmax applied to dim 2. Here’s my model

class CharRNN(nn.Module):
    def __init__(self, seq_len, input_size, hidden_size, output_size):
        super(CharRNN, self).__init__()
        self.rnn = nn.RNN(input_size, hidden_size, batch_first=True) 
        self.op = nn.Linear(hidden_size, output_size)
        self.softmax = nn.LogSoftmax(dim=2)
    def forward(self, x):
        x, h = self.rnn(x)
        return self.softmax(self.op(x))

but when I use NLLLoss() as the criterion and batch_size=32, seq_len=101, num_classes=32, I get the following error

ValueError: Expected target size (32, 32), got torch.Size([32, 101, 32])

Please help if my understanding is correct or am I making a logical mistake somewhere. Thanks.

1 Like


x, h = self.rnn(x)`

x has the shape (seq_len, batch, hidden_size), in your case (101, 32, hidden_size), since the output contains the hidden states for ALL 101 time steps.

The most obvious solution would to to use only the last hidden state. So you can either do

  • self.op(x[-1]) or
  • self.op(h[-1])

In your case, both x[-1] and h[-1] will have the shape (batch, hidden_size), in your case (32, hidden_size).

The output of the self.op layer will be (batch_size, num_classes), in your case (32, 32), exactly what you want.

The last change you need to do is to change dim=2 to dim=1 in your definition of the Softmax layer.

Hi @vdw
thanks for your response. If I only use the last output or hidden layer from the RNN I’m only getting one prediction for one sequence. Whereas I want predictions for each character in the sequence (101 seq_length). Also to clarify the output shape I mentioned is actually the shape of the labels. So my input and labels are of same shape (batch_size, seq_len, num_classes).

You might want to look a this post, it seems very related. The link Udacity tutorial is also exactly about a character RNN.

@vdw, The suggestion worked
I used the Udacity’s Notebook to change the output and final activations in my model. The outputs were changed from (batch_size, seq_len, num_classes) to (batch_size, seq_len), after I removed OHEncoding for the output. During the training time, the raw logit outputs were used in the loss function and in the inference stage I applied Softmax to change the model outputs to probabilities. I guess that’s the only way to do it.