I have an LSTM I am having it return one-hot vectors. I just use the vocabulary size. But now its randomly generating SOS tokens and PADD tokens when I don’t want it (like in the middle of a sentence or something like that).
How are LSTM models fixed to avoid this?
Note that just removing them from the indices list causes the problem that now the first token I pass my LSTM cannot be SOS
That sounds like your LSTM may not be trained well. You may want to run your training longer to let LSTM learn how to make better sequences.
Another thing you could do if perform beam search in your sequence generation. This would enable you to explore what the sequence would look like if you feel the SOS/PAD token has popped up unnaturally.
Oh ok. So its normal to allow the network to output random SOS and PADD tokens? It just feels strange to me to leave them in the vocabulary, knowing that they are non-sense!
If it outputs SOS is the network usually just punished with the Cross Entropy loss?