Improving NMT model outputs

Dear @All

I have an MTN project based on TRANSFORMER everything is working fine, for the final translation output the model translates number to zeros 0000, and non-English names to <UNK>.

I need to solve these issues to increase my blue score. any suggestions?

Rare words or out-of-vocabulary words are a fundamental challenge for NMT. You still find very recent academic papers addressing this.

For example, for a very simple NMT task, I used an off-the-shelf NER system to replace, say, person names. So 2 sentences “I met Alice” and “I met Bob” would be converted to "I met "; same for the target sentences. After the translation, I would simple replace with the actual name. Replacing numbers with would also be very easy with a RegEx. It worked fine enough for my use case, but it’s probably too naive for the general case.

1 Like

Thank you Sir @vdw , That what I’m looking for.