Combine sparse features with pre-trained word embeddings

Hi, i’m trying to create a working example of RASA’s Dual Intent & Entity Transformer (https://arxiv.org/pdf/2004.09936.pdf) model in pytorch and am having trouble with combining sparse features with dense features.

Basically, the paper uses sparse features and embeds them, and then combines them with BERT’s output.

My nn.Module has the following layers

        self.entities_list = ["O"] + config.entities
        self.num_entities = len(self.entities_list)
        self.intents_list = config.intents
        self.num_intents = len(self.intents_list)

        self.ffnn = nn.Linear(config.hidden_size, self.config.embedding_dim)
        self.transformer = TransformerModel(
            d_model=self.config.embedding_dim,
            nhead=8,
            nlayers=2,
            d_hid=self.config.max_token_len)

        tag_to_idx = prepare_tag_to_idx(self.entities_list)
        self.crf = BiLSTM_CRF(len(self.entities_list)+3, 
emb_dim=self.config.embedding_dim)
        self.intents_classifier = nn.Linear(
            self.config.embedding_dim,
            self.num_intents)

and my forward method accepts both features & dense (BERT) features. My question is, how do i combine them?

thank you, i would love if you could direct me somewhere as well that could help me fully assemble the model as I’m having trouble finishing it (it has horrible results).