Hi guys,
currently I am working on a nlp project. I have just a small amount of samples (~3k) but received decent results with BERT. The given problem is a regression task. Each document is rated by users. The mean of these ratings is my target. For example text1 - 0.3242, text2 - 2.4232 and so on. In my baseline, I created the following model:
class BERT_BASELINE(nn.Module):
def __init__(self):
super(BERT_BASELINE, self).__init__()
self.bert = transformers.BertModel.from_pretrained('bert-base-uncased')
self.drop = nn.Dropout(0.3)
self.out1 = nn.Linear(768, 256)
self.relu = nn.ReLU()
self.out2 = nn.Linear(256, 1)
def forward(self, ids, mask, token_type_ids):
_, output = self.bert(ids, attention_mask=mask, token_type_ids=token_type_ids, return_dict=False)
output = self.drop(output)
output = self.out1(output)
output = self.relu(output)
output = self.out2(output)
return output
In a second step I created a lot of “traditional” nlp features on the same dataset (369 features in total). Now I want combine it with the BERT outputs:
class BERT_ADVANCED(nn.Module):
def __init__(self):
super(BERT_ADVANCED, self).__init__()
self.bert = transformers.BertModel.from_pretrained('bert-base-uncased')
self.drop = nn.Dropout(0.1)
self.out1 = nn.Linear(768+369, 1300)
self.relu = nn.ReLU()
self.out2 = nn.Linear(1300, 256)
self.out3 = nn.Linear(256, 1)
def forward(self, ids, mask, token_type_ids, features):
_, output = self.bert(ids, attention_mask=mask, token_type_ids=token_type_ids, return_dict=False)
output = self.drop(output)
x2 = features
output_final = torch.cat((output, x2), dim=1)
output_final = F.relu(self.out1(output_final))
output_final = F.relu(self.out2(output_final))
output_final = self.out3(output_final)
return output_final
However, my results (in terms of rmse) don´t really improve. I have 2 questions:
- Do I miss some steps after combining the bert output and my own features?
- Makes it sense to feed the hand crafted features through a mlp before combining it?
Thanks!