Hi
I’m trying to train a character level RNN with a difference that each token(character) gets classified into one of m classes (multi class classification). My Input is of the shape (batch_size, seq_len, num_classes) because I’m using OHEncoding and output is of shape (batch_size, seq_len,num_classes) with softmax applied to dim 2. Here’s my model
x has the shape (seq_len, batch, hidden_size), in your case (101, 32, hidden_size), since the output contains the hidden states for ALL 101 time steps.
The most obvious solution would to to use only the last hidden state. So you can either do
self.op(x[-1]) or
self.op(h[-1])
In your case, both x[-1] and h[-1] will have the shape (batch, hidden_size), in your case (32, hidden_size).
The output of the self.op layer will be (batch_size, num_classes), in your case (32, 32), exactly what you want.
The last change you need to do is to change dim=2 to dim=1 in your definition of the Softmax layer.
Hi @vdw
thanks for your response. If I only use the last output or hidden layer from the RNN I’m only getting one prediction for one sequence. Whereas I want predictions for each character in the sequence (101 seq_length). Also to clarify the output shape I mentioned is actually the shape of the labels. So my input and labels are of same shape (batch_size, seq_len, num_classes).
@vdw, The suggestion worked
I used the Udacity’s Notebook to change the output and final activations in my model. The outputs were changed from (batch_size, seq_len, num_classes) to (batch_size, seq_len), after I removed OHEncoding for the output. During the training time, the raw logit outputs were used in the loss function and in the inference stage I applied Softmax to change the model outputs to probabilities. I guess that’s the only way to do it.