RNN is not learning even simple concepts

I’ve been trying to make an RNN that predicts the next character in a string. (I’m trying to do it without the built-in models because I thought I would learn more that way.)

Unfortunately, it seems the model is only able to learn the rough distribution of letters; It hasn’t even realised that you shouldn’t put numbers right next to letters. It is as if it can’t see the input or the hidden layers. Given that it has ~400k trainable parameters I think it should be doing better.

Things I’ve checked:

  1. The input is getting through (checked with the print statement commented out below)
  2. The hidden layers are being passed on

Their may also be a problem with my understanding of Python as I am not very experienced with this programming language.

Here’s the model definition, other files are in this zip because I didn’t want to make this post overly long. I’ve replaced the loading of training data with just a couple hard-coded strings because the data-set is quite large.

class RNN(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(RNN, self).__init__()

        self.hidden_size = hidden_size
        
        self.in_to_h = nn.Linear(input_size + hidden_size, hidden_size).to(device)
        self.h1_to_h2 = nn.Linear(hidden_size + hidden_size, hidden_size).to(device)
        self.h2_to_h3 = nn.Linear(hidden_size + hidden_size, hidden_size).to(device)
        self.h_to_out = nn.Linear(hidden_size, output_size).to(device)
        self.softmax = nn.LogSoftmax(dim=1).to(device)

        self.act = nn.Tanh()

    def forward(self, input, hidden):
        #Print input
        #print(all_letters[torch.argmax(input, 1).cpu().numpy()[0]]) 
        hidden[0] = self.act(self.in_to_h(torch.cat((input, hidden[0]), 1)))
        hidden[1] = self.act(self.h1_to_h2(torch.cat((hidden[0], hidden[1]), 1)))
        hidden[2] = self.act(self.h2_to_h3(torch.cat((hidden[1], hidden[2]), 1)))

        output = self.h_to_out(hidden[2])
        output = self.softmax(output)
        return output, hidden

    def initHidden(self):
        hidden = []
        for i in range(3):
            hidden.append(Variable(torch.zeros(1, self.hidden_size, device = device)))
        return hidden