Improving NMT model outputs

Aiman_Mutasem-bellh · August 5, 2020, 11:27am

Dear @All

I have an MTN project based on TRANSFORMER everything is working fine, for the final translation output the model translates number to zeros 0000, and non-English names to <UNK>.

I need to solve these issues to increase my blue score. any suggestions?

vdw · August 5, 2020, 12:33pm

Rare words or out-of-vocabulary words are a fundamental challenge for NMT. You still find very recent academic papers addressing this.

For example, for a very simple NMT task, I used an off-the-shelf NER system to replace, say, person names. So 2 sentences “I met Alice” and “I met Bob” would be converted to "I met "; same for the target sentences. After the translation, I would simple replace with the actual name. Replacing numbers with would also be very easy with a RegEx. It worked fine enough for my use case, but it’s probably too naive for the general case.

Aiman_Mutasem-bellh · August 5, 2020, 12:51pm

Thank you Sir @vdw , That what I’m looking for.