Hi I am learning pytorch now.
I was wondering if there are some conditions that GRU outperforms LSTM?
Another question is that the number of layers in LSTM or GRU,
in which case we need to use 2+ layers of LSTM or GRU?
GRU may be less prone to overfitting on some small datasets since it only has two gates while lstm has three. I am recently carry out on experiments on a sound event detection problem where gru outperforms lstm a little.