I was implementing the little part of speech tagger from the tutorial and I was wondering how I could transform this class into a Bi-Directional LSTM.
I am completely new to pytorch, I don’t know if there is an easy way to do so.
Thanks a lot!
I was implementing the little part of speech tagger from the tutorial and I was wondering how I could transform this class into a Bi-Directional LSTM.
I am completely new to pytorch, I don’t know if there is an easy way to do so.
Thanks a lot!
PyTorch RNN classes have a bidirectional
keyword flag. So this is quite easy. You should simply change to
class LSTMTagger(nn.Module):
def __init__(self, embedding_dim, hidden_dim, vocab_size, tagset_size):
super(LSTMTagger, self).__init__()
self.hidden_dim = hidden_dim
self.word_embeddings = nn.Embedding(vocab_size, embedding_dim)
# The LSTM takes word embeddings as inputs, and outputs hidden states
# with dimensionality hidden_dim.
self.lstm = nn.LSTM(embedding_dim, hidden_dim, bidirectional=True) # <- change here
# The linear layer that maps from hidden state space to tag space
self.hidden2tag = nn.Linear(hidden_dim * 2, tagset_size)
self.hidden = self.init_hidden()
def init_hidden(self):
# Before we've done anything, we dont have any hidden state.
# Refer to the Pytorch documentation to see exactly
# why they have this dimensionality.
# The axes semantics are (num_layers * num_directions, minibatch_size, hidden_dim)
return (autograd.Variable(torch.zeros(2, 1, self.hidden_dim)),
autograd.Variable(torch.zeros(2, 1, self.hidden_dim))) # <- change here: first dim of hidden needs to be doubled
def forward(self, sentence):
embeds = self.word_embeddings(sentence)
lstm_out, self.hidden = self.lstm(
embeds.view(len(sentence), 1, -1), self.hidden)
tag_space = self.hidden2tag(lstm_out.view(len(sentence), -1))
tag_scores = F.log_softmax(tag_space, dim=1)
return tag_scores
The two lines I changed are marked by # <- change here
edit: changed another line I missed. see below for details.
Hello Simon,
Thanks a lot, this is way simpler than I thought, but I get the following error when I run the small example after changing the lines you indicated:
RuntimeError: size mismatch, m1: [5 x 12], m2: [6 x 3]
Any idea why?
Oh right. I forgot that the outputs also have both directions. Then your self.hidden2tag = nn.Linear(hidden_dim, tagset_size)
should become self.hidden2tag = nn.Linear(hidden_dim * 2, tagset_size)
. Hope that this works!
Working like a charm, thank you so much!