CNN-LSTM architecture

nr_spider · May 7, 2022, 12:44pm

Hi all,

I am trying to develop CNN-LSTM model for text classification. Here are the __init__ function and forward function of my code:

def __init__(self, bert_config, device, dropout_rate, n_class, out_channel=16, lstm_hidden_size=None):


        super(CustomBertCNNLSTMModel, self).__init__()

        self.bert_config = bert_config
        self.dropout_rate = dropout_rate
        self.n_class = n_class
        self.out_channel = out_channel

        model_config = AutoConfig.from_pretrained(self.bert_config, output_hidden_states=True)
        self.bert = AutoModel.from_pretrained(self.bert_config, config=model_config)
        self.out_channels = self.bert.config.num_hidden_layers*self.out_channel
        self.tokenizer = AutoTokenizer.from_pretrained(self.bert_config, model_max_length=512)

        if not lstm_hidden_size:
            self.lstm_hidden_size = self.bert.config.hidden_size
        else:
            self.lstm_hidden_size = lstm_hidden_size

        self.conv = nn.Conv2d(in_channels=self.bert.config.num_hidden_layers,
                              out_channels=self.out_channels,
                              kernel_size=(3, self.bert.config.hidden_size),
                              groups=self.bert.config.num_hidden_layers)
        self.lstm = nn.LSTM(self.bert.config.hidden_size, self.lstm_hidden_size, bidirectional=False)
        self.hidden_to_softmax1 = nn.Linear(self.out_channels, self.n_class, bias=True)
        self.hidden_to_softmax2 = nn.Linear(self.lstm_hidden_size * 2, n_class, bias=True)
        self.dropout = nn.Dropout(p=self.dropout_rate)
        self.device = device

def forward(self, sents):
        sents_tensor, masks_tensor, sents_lengths = sents_to_tensor(self.tokenizer, sents, self.device)
        encoded_layers = self.bert(input_ids=sents_tensor, attention_mask=masks_tensor, return_dict=True)
        encoded_stack_layer = torch.stack(encoded_layers, 1) 

        conv_out = self.conv(encoded_stack_layer) 
        conv_out = torch.squeeze(conv_out, dim=3)  some_length)
        conv_out, _ = torch.max(conv_out, dim=2) 
        enc_hiddens, (last_hidden, last_cell) = self.lstm(pack_padded_sequence(conv_out, sents_lengths,enforce_sorted=False))
        output_hidden = torch.cat((last_hidden[0], last_hidden[1]), dim=1)  
        output_hidden = self.dropout(output_hidden)
        pre_softmax = self.hidden_to_softmax2(output_hidden)

        return pre_softmax

I get following error:

expected_input_dim, input.dim()))
RuntimeError: input must have 2 dimensions, got 1

In

enc_hiddens, (last_hidden, last_cell) = self.lstm(pack_padded_sequence(conv_out, sents_lengths,enforce_sorted=False))

I really am confused about feeding CNN output to LSTM and developing an hybrid model. Can someone kindly point out me the right direction?

Ehsan1997 · May 8, 2022, 2:21am

If you print conv_out.shape, you might get a better idea, but I believe it’s a 1D tensor (or maybe 2, if you count the batch size).

LSTMs expected shape: (time_steps, batch_size, features)

nr_spider · May 9, 2022, 2:34am

I actually corrected the error. But I am not sure if the architecture is correct. Appreciate if someone could clarify that

vdw · May 10, 2022, 5:37am

You might want to explain what your trying to do. Something like this? Trying to figure it out from just looking at your code with no comments is tedious. You cannot expect many replies with that.

nr_spider · May 14, 2022, 5:27am

I am trying to develop a hybrid CNN-LSTM architecture using BERT. I have mentioned that in the description of the question. Mentioned codes are the init and forward functions of the architecture. First BERT embeddings are feed to the CNN layer then the output of it is feed to the LSTM layer.