[solved] Train initial hidden state of RNNs

Yeah this doesn’t seem to be what you want to do.

I think what you want is what is done here in the original torch impl of an NTM.


You want to initialize your memory matrix with a vanilla variable (either normal distribution or all constant values).
Then pass it through a linear layer. You will thereby learn a layer that can initialize your hidden state.