Is LSTM always better than GRU?

Hi I am learning pytorch now.
I was wondering if there are some conditions that GRU outperforms LSTM?
Another question is that the number of layers in LSTM or GRU,
in which case we need to use 2+ layers of LSTM or GRU?

Thank you for your time and the help.

1 Like

if there are some conditions that GRU outperforms LSTM?
Yes

in which case we need to use 2+ layers of LSTM or GRU?
we need it when we want more capacity

Thanks Daniil. I will learn more about these two algorithms.

No…I think GRU is a default option instead of LSTM.

1 Like

Hi Matthew, thanks for the help.

I find GRU are less computationally demanding hence faster to train and better for smaller datasets

LSTM are better for large datasets where models that retain a longer timestep understanding perform better

5 Likes

dgriff Thank you for sharing your experience!

GRU may be less prone to overfitting on some small datasets since it only has two gates while lstm has three. I am recently carry out on experiments on a sound event detection problem where gru outperforms lstm a little.

2 Likes

Thanks. I will also try my datasets using gru!

1 Like

I try to implement lstm instead of gru in translation with a sequence to sequence network and attention tutorial
https://pytorch.org/tutorials/intermediate/seq2seq_translation_tutorial.html#sphx-glr-intermediate-seq2seq-translation-tutorial-py
But I face a problem
Is there any one use lstm in Seq2seq model based on this tutorial?
Thanks…