LSTM Multi-class sentence classification

Hi all,

Suppose I have a multi-class text classification problem and I want to use LSTMs. I found several codes to implement this but I have been confused since there are different implementation about how to use the output of the LSTMs some of them are like the below

def forward(self, x):
        h_embedding = self.embedding(x)
        h_embedding = torch.squeeze(torch.unsqueeze(h_embedding, 0))
        
        h_lstm, _ = self.lstm(h_embedding)
        avg_pool = torch.mean(h_lstm, 1)
        max_pool, _ = torch.max(h_lstm, 1)
        conc = torch.cat(( avg_pool, max_pool), 1)
        conc = self.relu(self.linear(conc))
        conc = self.dropout(conc)
        out = self.out(conc)
        return out

but I cannot understand what they did.

and others like the below:

  def forward(self, x):
        h_embedding = self.embedding(x)        
        h_lstm, _ = self.lstm(h_embedding)
        max_pool, _ = torch.max(h_lstm, 1)        
        linear = self.relu(self.linear(max_pool))
        out = self.out(linear)
        return out

Why the use max_pool?

What’s one correct architecture (for multi-class problem) since I want as output the number of class one sentence belong? which is the right Loss function? any help

Regards,

Why LSTM instead of GRU? Why the output and not the hidden state of an RNN? Why ReLu and not Tanh, Sigmoid, etc.? Why not more linear layers before the output? …? :slight_smile:

There’s not one-size-fits-all solution, even for basic network models and/or for the same task (e.g., sentence classification). At the end of the day, there’s a lot of trial-and-error involved when it comes the training. Sure, it’s a good approach to start with tried-and-tested approaches. For example, I use the hidden state of an RNN to train a text classifier.