Stacked LSTM vs simple LSTM

Good evening,

This is more of a general question. I am new to pytorch and forgot about the existence of a stacked LSTM. I realized changing the num_layers parameter in the LSTM initialization can make it stacked. My question now is in what cases should a stacked LSTM be preferred over a simple one? Is num_layers a hyperparameter to be fine-tuned? Or is it more task-specific?

For context, I am doing a sequence classification task on textual posts to predict the verdict. Is it the case that a stacked LSTM is always preferred? Or is this just another hyperparameter to finetune?

he choice between a simple LSTM and a stacked LSTM depends on task complexity, dataset size, and computational resources. Stacked LSTMs can capture more complex patterns but may require larger datasets and more resources. The number of layers (num_layers) is a hyperparameter to fine-tune based on validation set performance. Experiment and evaluate different configurations to find the best model for your task. I general

  1. Task complexity: If your task involves capturing more complex and hierarchical patterns, a stacked LSTM might be a better choice, as it can learn different levels of abstraction across layers.

  2. Training time and computational resources: Stacked LSTMs typically take longer to train and require more computational resources. If you have limited resources, a single-layer LSTM might be a better option.