Problem
I am going over the this PyTorch tutorial on PyTorch official website, which tries to classify names into their origins without using torch.nn.RNN
but uses basic building block torch.nn.Linear
instead (as could be seen here.
class RNN(nn.Module):
def __init__(self, input_size, hidden_size, output_size):
super(RNN, self).__init__()
self.hidden_size = hidden_size
self.i2h = nn.Linear(input_size + hidden_size, hidden_size)
self.i2o = nn.Linear(input_size + hidden_size, output_size)
self.softmax = nn.LogSoftmax(dim=1)
def forward(self, input, hidden):
combined = torch.cat((input, hidden), 1)
hidden = self.i2h(combined)
output = self.i2o(combined)
output = self.softmax(output)
return output, hidden
def initHidden(self):
return torch.zeros(1, self.hidden_size)
As much as I find this implementation quite insightful, I found its formulation inconsistent with formal definitions found in Goodfellow’s deep learning book (Page 374).
.
Specifically,
- There is no tanh() applied to updated hidden state.
- Output state is from combined vector instead of updated hidden state.
However, according to this tutorial, the result does seem to be good. I am not sure if there is formal name for this architecture.
Could someone help me? Thank you in advance.