I’m trying to implement a character RNN for the purpose of spell correction and tokenization. The model is based on the practical pytorch GRU-RNN implementation (https://github.com/spro/practical-pytorch/blob/master/seq2seq-translation/seq2seq-translation-batched.py) - the loss function is masked cross entropy like they use here and I’m using a 2 layer bidirectional GRU for my encoder/decoder. I am also using scheduled sampling with a teacher forcing ratio of 0.5.
The problem I’m having is that the system is struggling to output end tokens. Sentences such as ‘do3s t4is w0rk’ are being correctly changed to ‘does this work’, but then random tokens are outputted until either the system reaches it’s max output length or seemingly randomly puts down an EOS token, making the output something like ‘does this workkkkkk orwk EOS’.
Does anyone have any insight into this problem? I tried unmasking my loss function but this lead to the system outputting PAD tokens now. The system is able to overfit very easy sets so I don’t think there’s a bug in the training. Any input would be appreciated!