Is LSTM always better than GRU?

Hi I am learning pytorch now.
I was wondering if there are some conditions that GRU outperforms LSTM?
Another question is that the number of layers in LSTM or GRU,
in which case we need to use 2+ layers of LSTM or GRU?

Thank you for your time and the help.

1 Like

if there are some conditions that GRU outperforms LSTM?

in which case we need to use 2+ layers of LSTM or GRU?
we need it when we want more capacity

Thanks Daniil. I will learn more about these two algorithms.

No…I think GRU is a default option instead of LSTM.

1 Like

Hi Matthew, thanks for the help.

I find GRU are less computationally demanding hence faster to train and better for smaller datasets

LSTM are better for large datasets where models that retain a longer timestep understanding perform better


dgriff Thank you for sharing your experience!

GRU may be less prone to overfitting on some small datasets since it only has two gates while lstm has three. I am recently carry out on experiments on a sound event detection problem where gru outperforms lstm a little.


Thanks. I will also try my datasets using gru!

1 Like

I try to implement lstm instead of gru in translation with a sequence to sequence network and attention tutorial
But I face a problem
Is there any one use lstm in Seq2seq model based on this tutorial?