Is LSTM always better than GRU?

zhidali · July 28, 2017, 7:51am

Hi I am learning pytorch now.
I was wondering if there are some conditions that GRU outperforms LSTM?
Another question is that the number of layers in LSTM or GRU,
in which case we need to use 2+ layers of LSTM or GRU?

Thank you for your time and the help.

analvikingur · July 28, 2017, 8:30am

if there are some conditions that GRU outperforms LSTM?
Yes

in which case we need to use 2+ layers of LSTM or GRU?
we need it when we want more capacity

zhidali · July 28, 2017, 7:52pm

Thanks Daniil. I will learn more about these two algorithms.

matthew_zeng · July 29, 2017, 12:21am

No…I think GRU is a default option instead of LSTM.

zhidali · July 29, 2017, 1:15am

Hi Matthew, thanks for the help.

dgriff · July 29, 2017, 1:22am

I find GRU are less computationally demanding hence faster to train and better for smaller datasets

LSTM are better for large datasets where models that retain a longer timestep understanding perform better

zhidali · July 29, 2017, 8:46am

dgriff Thank you for sharing your experience!

BigeyeDestroyer · August 1, 2017, 3:47am

GRU may be less prone to overfitting on some small datasets since it only has two gates while lstm has three. I am recently carry out on experiments on a sound event detection problem where gru outperforms lstm a little.

zhidali · August 1, 2017, 4:11pm

Thanks. I will also try my datasets using gru!

Dania_Sagheer · December 10, 2018, 5:37pm

I try to implement lstm instead of gru in translation with a sequence to sequence network and attention tutorial
https://pytorch.org/tutorials/intermediate/seq2seq_translation_tutorial.html#sphx-glr-intermediate-seq2seq-translation-tutorial-py
But I face a problem
Is there any one use lstm in Seq2seq model based on this tutorial?
Thanks…