How does one decide if to pass the softmax or one-hot to the next cell of an RNN/LSTM/GRU?

I have my RNN/LSTM/GRU outputing a softmax distribution over tokens. How do I decide if to pass the soft-vector or the thresholded/sampled one-hot tensor version of it as input to the next cell?

(teacher training is not possible in my application)

Hi pinocchio,
You should pass the direct output of the rnn to the next cell. Sampling/threshold will cause a discontinuity in the graph and the gradients won’t back propagate through time.

1 Like