Hi everyone,
I have been reading articles on parse Biaffine for several days, but I just can’t understand how the biaffine module should be implemented. My parser uses an affine MLP and I just can’t find how to change it to Biaffine.
The structure of my parser is as follows:
Parser(
(dropout): Dropout(p=0.6, inplace=False)
(word_embedding): Embedding(381, 100, padding_idx=0)
(tag_embedding): Embedding(17, 40, padding_idx=0)
(bilstm): LSTM(908, 600, num_layers=3, batch_first=True, dropout=0.3, bidirectional=True)
(bilstm_to_hidden1): Linear(in_features=1200, out_features=500, bias=True)
(hidden1_to_hidden2): Linear(in_features=500, out_features=150, bias=True)
(hidden2_to_pos): Linear(in_features=150, out_features=45, bias=True)
(hidden2_to_dep): Linear(in_features=150, out_features=41, bias=True)
)
where Embedding = nn.Embedding
and Linear = nn.Linear
.
Now, what I need to change to implement BiAffine layer?
I think that this two layers:
self.bilstm_to_hidden1 = nn.Linear(...)
self.hidden1_to_hidden2 = nn.Linear(...)
are simply MLPs in which there is only one Linear layer.
So I think the right thing to do is replace
self.hidden2_to_pos = nn.Linear(...)
self.hidden2_to_dep = nn.Linear(...)
with something a BiAffine
module. What I found as a ““standard”” implementation in pytorch is the following:
class BiAffine(nn.Module):
"""Biaffine attention layer."""
def __init__(self, input_dim, output_dim):
super(BiAffine, self).__init__()
self.input_dim = input_dim
self.output_dim = output_dim
self.U = nn.Parameter(torch.FloatTensor(output_dim, input_dim, input_dim))
nn.init.xavier_uniform(self.U)
def forward(self, Rh, Rd):
Rh = Rh.unsqueeze(1)
Rd = Rd.unsqueeze(1)
S = Rh @ self.U @ Rd.transpose(-1, -2)
return S.squeeze(1)
but it doesn’t seem to work.
What am I doing wrong?
Do you have any ideas / suggestions?
Do you know if there is any code / module that I can use as a baseline to implement this module?
I no longer know where to turn.
Thanks a lot to everyone.