RNN implementation

cemicel · November 29, 2020, 4:35pm

Hi
can some one explain me the difference between:

On the left side Pytorch official implementation.
On the right side I took formulas for Wikipedia and A.Karpathy article.

Both of the formulas claims that they are Elman implementations. But in Wikipedia and A.Karpathy article they have 3 set of weights W_i, W_h, W_y when in Pytorch implementation only 2 set of weights.

PS:
This answer does not explain anything.

Sources:

steve_moody · December 24, 2020, 5:32pm

They are indeed the same.

The third weight is for the output vector. When RNNs have both outputs and hidden states that are output, there will be a third weight matrix for the y output. That third weight matrix is W_y. The PyTorch doc is specifically for the hidden activation function, which only has two weights in the equation.

On the left, PyTorch docs specify the tanh activation function, while the right equation has a generic activation function.

The right-hand side equation also has only one bias.

Hope this helps.