What is the best way to train a custom RNN that gets its output as input?

I have recurrent model that computes some output using the hidden state h_t of an LSTM and a m_t of a memory module. Then, in the next time step it should use that output as input after passing through another submodule. To be clear,

y_t = f(x_t, h_t, m_t)
h_t = g(x_t, y_(t-1), h_t)
m_t = p(x_t, h_t, m_t)

I know that the cuDNN implementation of LSTM with packed sequences works in an efficient way. However, it does not accept the output from previous step as input. I have thought about teacher forcing in which we give the ground truth instead of previous output. But in this way, I also have give the content of m_t which is an internal parameter. The only solution that I came up with is using LSTMCell and implementing other modules with 2 for loops (one for each sequence, one for each time step in the sequence).

Is there any more efficient way of implementing such a network? I want to accelerate it with a GPU, but do not want to create a kernel launch bottleneck.

Well, because you have always have dependencies on the previous time step in recurrent networks, there isn’t a magic way to fully untangle t-1 from t.

There are recurrent models that restructure the computation to try to do as much as possible on GPU.

Such as the simple recurrent unit.

It does speed up computation, but it’s also a bit more buggy :slight_smile: Once I get my seq2seq model working with standard LSTMCell, I’m going to try it again.

1 Like