I have noticed that there might be an error in the docs for the GRU layer.
Shouldn’t the hidden layer in the 4th line of the definition be presented as:
h(t) = ( (1 - z(t) ) * h(t-1) ) + ( z(t) * n(t) )
instead of: h(t) = ( (1 - z(t) ) * n(t) ) + ( z(t) * h(t-1) )
as it is currently written in the docs?
Here is the architecture of the GRU layer:
Thanks for raising this issue. Would you mind creating a GitHub issue so that the code owners can take a look there, too?