Let’s say I have a input text of sequence 5.

And I applied a non-Linear layer of size 20 then an LSTM layer of hidden state 40.

So what does it mean here? Each hidden node in lstm is connected by all the linear layer node 20 ?

If yes then for particular sequence let’s say only 10 nodes were fired, then how does the LSTM know the time step of what node need is first and what is next?

In the simplest RNN, there is only one *non-linear*, *fully connected* layer which, at each step `t`

, takes as input :

- The vector positioned at
`t`

in the sequence : `x_t`

(in your case, its size is 20)
- Its own vector output from the previous step :
`h_(t-1)`

(in your case, its size is 40)

to output the new vector `h_t`

(in your case, size 40 also).

This vector is the hidden state at step `t`

.

A LSTM is much more complex (see the docs to see how everything is computed), but the global idea is the same.

1 Like