Classifying names with a character-level rnn

I’m trying to tailor the tutorial towards my particular need, but I am not getting predictable and consistent output. It’s my first time using neural networks so excuse the nature of my questions.

I have 3 labeled datasets, with a total size of 27,666 that I will train the model on (80% - set1=8000, set2=5821, set3=8312) and then evaluate (20%) to calculate the accuracy of prediction of each category.

The main parameters I am tweaking are:

  1. n_iter
  2. learning_rate
  3. n_hidden
  4. n_confusion

At the moment n_iter is the entire 80% of the data set and n_hidden, n_confusion are fixed to the value in the tutorial. I am changing the learning rate from 0.00005 to 0.001.

Questions:

  1. What is the link between the train function (that uses the learning rate variable) and the evaluate function (that is used in the prediction)?
  2. Out of the 4 parameters above, which one has the most effect on the accuracy of prediction?
  3. How can I add more linear layers?
  1. The train function is used to train the model, i.e. tune its parameters so that the training loss decreases. The evaluate function is used to measure the current accuracy of your model on the validation dataset, which was not used to train the model. In the best case the validation accuracy should give you approx. the test accuracy, i.e. the final accuracy your model will have on new, unseen data in your future application.

  2. Probably all except number 4, which is just the number of iterations to get a random sample for your confusion matrix.
    n_iters defines the number of iterations to train your model. After some epochs you might see that the training loss does not decrease anymore or the validation loss starts to increase. In that case, you could stop the training, as your model will most likely overfit on the training data after this step (also called early stopping).
    learning_rate will define the step size your optimizer will use to change the parameters using the gradients. Some optimizers have also other parameters like momentum and running estimates.
    n_hidden defines basically the model capacity in this example. If your model has more capacity, it might learn the training dataset better, but might also overfit easier. If you see a good training accuracy and a bad validation accuracy, you might want to reduce the capacity or add regularization techniques.

Note that these explanations are really scratching the surface and there is a ton of papers about each step, so please take this info with a grain of salt. :wink:

  1. Where would you like to add this additional layer? If after the output, you could use this code:
class RNN(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(RNN, self).__init__()

        self.hidden_size = hidden_size

        self.i2h = nn.Linear(input_size + hidden_size, hidden_size)
        self.i2o = nn.Linear(input_size + hidden_size, output_size)
        self.additional_lin = nn.Linear(output_size, output_size)
        self.softmax = nn.LogSoftmax(dim=1)

    def forward(self, input, hidden):
        combined = torch.cat((input, hidden), 1)
        hidden = self.i2h(combined)
        output = self.i2o(combined)
        output = F.relu(self.additional_lin(output))
        output = self.softmax(output)
        return output, hidden

    def initHidden(self):
        return torch.zeros(1, self.hidden_size)

As you can see, you would just have to define this layer in __init__ with the right number of input and output features and apply it in forward.