Having different behavior on LSTM

This problem isn’t directly related to NLP but since it’s where most people use LSTM layers I thought I would tag it like this.

My problem is relatively simple, but I’m not sure about how to implement it.

Basically I have a time sequence oh n elements that I need to pass through different LSTM layers. It goes something like this:

First element of sequence:

  1. First element of the sequence is passed into the first LSTM layer.
  2. Nothing is passed into the second LSTM layer.
  3. Output of (1) is added with output of (2) and that is passed into the third LSTM layer.
  4. Output of (3) is passed into the fourth LSTM layer.

Second element of sequence:

  1. Second element of sequence is passed into the first LSTM layer.
  2. Ouput of ((4) first element) is passed into the second LSTM layer.
  3. Output of (1) is added with output of (2) and that is passed into the third LSTM layer.
  4. Output of (3) is passed into the fourth LSTM layer.

n element of sequence:

  1. n element of sequence is passed into the first LSTM layer.
  2. Ouput of ((4) n-1 element) is passed into the second LSTM layer.
  3. Output of (1) is added with output of (2) and that is passed into the third LSTM layer.
  4. Output of (3) is passed into the fourth LSTM layer.

So my goal is to have a recursion where the output of one of the layers modifies how the whole network behaves and since for the very first element there’s no (4) to pass into (2) i simply don’t know how to write proper code to make this happen. Seems like I could solve this with LSTMCell and using an if statement in the for loop that would pass 0 for the first element and the stored result of the last (4) on the next elements, though I am not sure how this would affect training, and on top of it LSTMCells are not optimized to run in GPU which greatly affects training.

So my question sort of goes in the line of is there a way to do something like this with the nn.LSTM module, or a way to use nn.LSTMCell with GPU?