I’ve been trying to make an RNN that predicts the next character in a string. (I’m trying to do it without the built-in models because I thought I would learn more that way.)
Unfortunately, it seems the model is only able to learn the rough distribution of letters; It hasn’t even realised that you shouldn’t put numbers right next to letters. It is as if it can’t see the input or the hidden layers. Given that it has ~400k trainable parameters I think it should be doing better.
Things I’ve checked:
- The input is getting through (checked with the print statement commented out below)
- The hidden layers are being passed on
Their may also be a problem with my understanding of Python as I am not very experienced with this programming language.
Here’s the model definition, other files are in this zip because I didn’t want to make this post overly long. I’ve replaced the loading of training data with just a couple hard-coded strings because the data-set is quite large.
class RNN(nn.Module):
def __init__(self, input_size, hidden_size, output_size):
super(RNN, self).__init__()
self.hidden_size = hidden_size
self.in_to_h = nn.Linear(input_size + hidden_size, hidden_size).to(device)
self.h1_to_h2 = nn.Linear(hidden_size + hidden_size, hidden_size).to(device)
self.h2_to_h3 = nn.Linear(hidden_size + hidden_size, hidden_size).to(device)
self.h_to_out = nn.Linear(hidden_size, output_size).to(device)
self.softmax = nn.LogSoftmax(dim=1).to(device)
self.act = nn.Tanh()
def forward(self, input, hidden):
#Print input
#print(all_letters[torch.argmax(input, 1).cpu().numpy()[0]])
hidden[0] = self.act(self.in_to_h(torch.cat((input, hidden[0]), 1)))
hidden[1] = self.act(self.h1_to_h2(torch.cat((hidden[0], hidden[1]), 1)))
hidden[2] = self.act(self.h2_to_h3(torch.cat((hidden[1], hidden[2]), 1)))
output = self.h_to_out(hidden[2])
output = self.softmax(output)
return output, hidden
def initHidden(self):
hidden = []
for i in range(3):
hidden.append(Variable(torch.zeros(1, self.hidden_size, device = device)))
return hidden