Problem of Vanishing Gradients in GRU in Seq2Seq model

Hi @Usama_Hasan
Thank you so much
I have tried some other stuff which can be related to this one. I summarize all the stuff I have done here:

  1. Sometimes, your data consists of independent sequences, meaning that you have a window that moves over your data. The data inside the window is a sequence, and sequences of data in different windows are independent. This way, you should detatch_() the hidden part in LSTM or GRU so that the history would be cut. Otherwise, in backpropagation through time, the backprop would go all the way to the beginning of the input data history.
  2. If your data contains very different values, such as Stock Market prices, try to normalize your data first. You can do it by, for example, MinMaxScaler() in sklearn library.
  3. Don’t forget to do gradient clipping in your model when using sequential data
1 Like