How to integrate POS as part of a BiLSTM model?

Hi, I am currently in the midst of recreating this paper.

The authors have built BiLSTM model, and trained Character CNN and part-of-speech POS embedding as part of this deep neural structure for Named Entity Recognition (NER). I have came across multiple implementations with the Character-level embedding as part of the bigger model, see example code here. I’ve decided to adopt the repository linked and implement my own POS embedding (which is updated as the model is training, not pre-trained).

However, I’m finding it extremely hard to understand the parameters to set and integrate a POS embedding into the LSTM model. I have been experimenting with my implementation. But, I’m at a loss on what parameters to set for model intitialisation and the forward function. I do not typically work with NNs at such granular-level.

Below is a snippet of the existing code from the repository I’ve adopted my implementation from, and my attempted integration. I have indicated what are the existing embeddings, and what are the codes I’ve added in from the comments.

Error has occurred due to the wrong dimensions of tensors. I can’t even further elaborate as I’m honestly unsure of why I set my parameters in a certain way in the first place. I’ve tried to recreate the POS embeddings similar to Character, but they are fundamentally different.

Help will be honestly appreciated, I have been dwelling over this issue for a long time now… Whether it be pointing me to an existing implementation or providing additional insights. Thank you so much.

Initialisation of the Model

# Word embedding
self.word2vec_emb = nn.Embedding(self.config.nwords, self.config.dim_word2vec, padding_idx=0)

# Character embedding
self.char_embeddings = nn.Embedding(self.config.nchars, self.config.dim_char, padding_idx=0)
self.char_lstm = nn.LSTM(self.config.dim_char, self.config.hidden_size_char, bidirectional=True)

# Problematic Implementation of POS embedding
self.pos_embeddings = nn.Embedding(self.config.npos, self.config.dim_pos, padding_idx=0)
self.pos_lstm = nn.LSTM(self.config.dim_pos, self.config.hidden_size_pos, bidirectional=True)

self.dropout = nn.Dropout(p=self.config.dropout)
self.word_lstm = nn.LSTM(self.config.dim_glove+2*self.config.hidden_size_char+2*self.config.hidden_size_pos, self.config.hidden_size_lstm, bidirectional=True)

Forward Function

 # Word Embedding
word_emb = self.emb(word_input) #shape= S*B*wnh
print("word_emb start", word_emb.size()) # Torch.tensor 511, 57, 200

# Character LSTM
char_emb = self.char_embeddings(char_input.view(-1, char_input.size(2)))
char_emb = char_emb.view(*char_input.size(), -1) #dim = BxSxWxE
print("char_emb", char_emb.size()) # Torch.tensor 57, 511, 21, 10

_, (h, c) = self.char_lstm(char_emb.view(-1, char_emb.size(2), char_emb.size(3)).transpose(0,1)) #(num_layers * num_directions, batch, hidden_size) = 2*BS*cnh
char_output = torch.cat((h[0], h[1]), 1) #shape = BS*2cnh
char_output = char_output.view(char_emb.size(0), char_emb.size(1), -1).transpose(0,1) #shape = S*B*2cn

# Problematic Integration of POS LSTM (Tried copying from Character LSTM)
pos_emb = self.pos_embeddings(pos_input.view(-1, pos_input.size(1)))
print(pos_emb.size()) # Torch.tensor 57, 511, 20

_, (h, c) = self.pos_lstm(pos_emb.view(-1, pos_emb.size(2)).transpose(0,1)) # ERROR HERE
pos_output = torch.cat((h[0], h[1]), 1) #shape = BS*2cnh
pos_output = pos_output.view(pos_emb.size(0), pos_emb.size(1), -1).transpose(0,1)

# Concat all output
word_emb = torch.cat((word_emb, char_output, pos_output ), 2)

output, (h, c) = self.word_lstm(word_emb)
output = self.linear(output)

Based on the posted code snippet it seems you are currently running into a shape mismatch in:

_, (h, c) = self.pos_lstm(pos_emb.view(-1, pos_emb.size(2)).transpose(0,1)) # ERROR HERE

Let’s try to break down what we are expecting in this part of the code.

Both layers are defined as:

# Problematic Implementation of POS embedding
self.pos_embeddings = nn.Embedding(self.config.npos, self.config.dim_pos, padding_idx=0)
self.pos_lstm = nn.LSTM(self.config.dim_pos, self.config.hidden_size_pos, bidirectional=True)

nn.Embedding expects an input as:

IntTensor or LongTensor of arbitrary shape containing the indices to extract

while nn.LSTM expects:

… (L,N,Hin​) when batch_first=False

You are using these layers here:

# Problematic Integration of POS LSTM (Tried copying from Character LSTM)
pos_emb = self.pos_embeddings(pos_input.view(-1, pos_input.size(1)))
print(pos_emb.size()) # Torch.tensor 57, 511, 20

and based on the shape of pos_emb of [57, 511, 20] and the previous comments I assume this tensor uses the shape of [batch_size, seq_len, features], where features=embedding_dim=self.config.dim_pos=20.

The error is raised when this output is passed to self.pos_lstm so let’s take a look at the reshaping operations and break it down into separate steps:

pos_emb # [batch_size, seq_len, features]
tmp = pos_emb.view(-1, pos_emb.size(2)) # [batch_size * seq_len, features] (you are flattening the temporal dimension into the batch dimension: is this really what you want?
tmp.transpose(0, 1) # [features, batch_size * seq_len]

As mentioned before, the expected shape of the input to this lstm layer is [L=seq_len, N=batch_size, Hin=features].
If my first assumption is correct (pos_emb # [batch_size, seq_len, features]), you would need to use:

self.pos_lstm(pos_emb.permute(1, 0, 2).contiguous())

to permute the dimensions into [seq_len, batch_size, features] before passing it to the lstm.

1 Like