Multimodal: Combine hugging outputs with tabular features

Hi guys,
currently I am working on a nlp project. I have just a small amount of samples (~3k) but received decent results with BERT. The given problem is a regression task. Each document is rated by users. The mean of these ratings is my target. For example text1 - 0.3242, text2 - 2.4232 and so on. In my baseline, I created the following model:

class BERT_BASELINE(nn.Module):

def __init__(self):

    super(BERT_BASELINE, self).__init__()
    self.bert = transformers.BertModel.from_pretrained('bert-base-uncased')
    self.drop = nn.Dropout(0.3)
    self.out1 = nn.Linear(768, 256)
    self.relu = nn.ReLU()
    self.out2 = nn.Linear(256, 1)

def forward(self, ids, mask, token_type_ids):

    _, output = self.bert(ids, attention_mask=mask, token_type_ids=token_type_ids, return_dict=False)
    output = self.drop(output)
    output = self.out1(output)
    output = self.relu(output)
    output = self.out2(output)
    return output

In a second step I created a lot of “traditional” nlp features on the same dataset (369 features in total). Now I want combine it with the BERT outputs:

class BERT_ADVANCED(nn.Module):

def __init__(self):

    super(BERT_ADVANCED, self).__init__()
    self.bert = transformers.BertModel.from_pretrained('bert-base-uncased')
    self.drop = nn.Dropout(0.1)
    self.out1 = nn.Linear(768+369, 1300)
    self.relu = nn.ReLU()
    self.out2 = nn.Linear(1300, 256)
    self.out3 = nn.Linear(256, 1)

def forward(self, ids, mask, token_type_ids, features):

    _, output = self.bert(ids, attention_mask=mask, token_type_ids=token_type_ids, return_dict=False)
    output = self.drop(output)
    x2 = features
    output_final = torch.cat((output, x2), dim=1)
    output_final = F.relu(self.out1(output_final))
    output_final = F.relu(self.out2(output_final))
    output_final = self.out3(output_final)
    return output_final

However, my results (in terms of rmse) don´t really improve. I have 2 questions:

  1. Do I miss some steps after combining the bert output and my own features?
  2. Makes it sense to feed the hand crafted features through a mlp before combining it?

Thanks!