where \$x_t\$ is the input word embedding. The special symbol`<pad>` denotes padding with zero. \$h_{0}^{l_1} \$ and \$h_{0}^{l_2} \$ are initialized with zero vectors.
so how to deal with it when LSTM need multi inputs? I think it needs `LSTMCell` because of the differences between layer 1 and layer 2. But LSTMCell’s input are `input,h,c`. How to realize it as the representation of formula (2) and (3) Or shoud I concatenate \$y_{t-1}\$ and \$s_{t-1}^{l_2}\$ as `input`?