How to handle unknown and rare words with the Transformer model

Hi all,
I am using a generic Transformer model for translation and due to multi-headed attention, it seems like aligning target-source to replace an unknown token with source token is not trivial.

I was wondering if you know of any other techniques to replace unknown or rare words that work with the Transformer model? Do you know what is the standard industry practice (in addition to using BPE or SentencePiece to mitigate the problem)? Any idea of how Facebook or Google tackles this challenge?

Thanks

1 Like