Hi, I am currently in the midst of recreating this paper.
The authors have built BiLSTM model, and trained Character CNN and part-of-speech POS embedding as part of this deep neural structure for Named Entity Recognition (NER). I have came across multiple implementations with the Character-level embedding as part of the bigger model, see example code here. I’ve decided to adopt the repository linked and implement my own POS embedding (which is updated as the model is training, not pre-trained).
However, I’m finding it extremely hard to understand the parameters to set and integrate a POS embedding into the LSTM model. I have been experimenting with my implementation. But, I’m at a loss on what parameters to set for model intitialisation and the forward function. I do not typically work with NNs at such granular-level.
Below is a snippet of the existing code from the repository I’ve adopted my implementation from, and my attempted integration. I have indicated what are the existing embeddings, and what are the codes I’ve added in from the comments.
Error has occurred due to the wrong dimensions of tensors. I can’t even further elaborate as I’m honestly unsure of why I set my parameters in a certain way in the first place. I’ve tried to recreate the POS embeddings similar to Character, but they are fundamentally different.
Help will be honestly appreciated, I have been dwelling over this issue for a long time now… Whether it be pointing me to an existing implementation or providing additional insights. Thank you so much.
Initialisation of the Model
# Word embedding
self.word2vec_emb = nn.Embedding(self.config.nwords, self.config.dim_word2vec, padding_idx=0)
# Character embedding
self.char_embeddings = nn.Embedding(self.config.nchars, self.config.dim_char, padding_idx=0)
self.char_lstm = nn.LSTM(self.config.dim_char, self.config.hidden_size_char, bidirectional=True)
# Problematic Implementation of POS embedding
self.pos_embeddings = nn.Embedding(self.config.npos, self.config.dim_pos, padding_idx=0)
self.pos_lstm = nn.LSTM(self.config.dim_pos, self.config.hidden_size_pos, bidirectional=True)
self.dropout = nn.Dropout(p=self.config.dropout)
self.word_lstm = nn.LSTM(self.config.dim_glove+2*self.config.hidden_size_char+2*self.config.hidden_size_pos, self.config.hidden_size_lstm, bidirectional=True)
Forward Function
# Word Embedding
word_emb = self.emb(word_input) #shape= S*B*wnh
print("word_emb start", word_emb.size()) # Torch.tensor 511, 57, 200
# Character LSTM
char_emb = self.char_embeddings(char_input.view(-1, char_input.size(2)))
char_emb = char_emb.view(*char_input.size(), -1) #dim = BxSxWxE
print("char_emb", char_emb.size()) # Torch.tensor 57, 511, 21, 10
_, (h, c) = self.char_lstm(char_emb.view(-1, char_emb.size(2), char_emb.size(3)).transpose(0,1)) #(num_layers * num_directions, batch, hidden_size) = 2*BS*cnh
char_output = torch.cat((h[0], h[1]), 1) #shape = BS*2cnh
char_output = char_output.view(char_emb.size(0), char_emb.size(1), -1).transpose(0,1) #shape = S*B*2cn
# Problematic Integration of POS LSTM (Tried copying from Character LSTM)
pos_emb = self.pos_embeddings(pos_input.view(-1, pos_input.size(1)))
print(pos_emb.size()) # Torch.tensor 57, 511, 20
_, (h, c) = self.pos_lstm(pos_emb.view(-1, pos_emb.size(2)).transpose(0,1)) # ERROR HERE
pos_output = torch.cat((h[0], h[1]), 1) #shape = BS*2cnh
pos_output = pos_output.view(pos_emb.size(0), pos_emb.size(1), -1).transpose(0,1)
# Concat all output
word_emb = torch.cat((word_emb, char_output, pos_output ), 2)
output, (h, c) = self.word_lstm(word_emb)
output = self.linear(output)