How does word typos detection network intergrates typos confusion set?

I want to design a word typo detection network based on the attention mechanism. I currently want to integrating typos confusion sets into neural network training. I hope during the network training process, the network will pay attention to the correct words corresponding to the possible wrong words. How to design the network is more reasonable?

The typos confusion set contain many typos confusion pairs:

the typos confusing pairs of typos can represented as:


which indicates that right_word may be wrongly written as wrong_word1 or wrong_word2 or wrong_wordn