Bleu Score in torchtext

Tejan_Mehndiratta · May 17, 2021, 1:29am

Hi,

In the example given on the pytorch website for calculating the bleu_score:

>>> from torchtext.data.metrics import bleu_score
>>> candidate_corpus = [['My', 'full', 'pytorch', 'test'], ['Another', 'Sentence']]
>>> references_corpus = [[['My', 'full', 'pytorch', 'test'], ['Completely', 'Different']], [['No', 'Match']]]
>>> bleu_score(candidate_corpus, references_corpus)
    0.8408964276313782

My question is:
For the sentence ['My', 'full', 'pytorch', 'test'], the Bleu is getting calculated against [['My', 'full', 'pytorch', 'test'], ['Completely', 'Different']] and for the sentence ['Another', 'Sentence'] the Bleu is getting calculated against [['No', 'Match']], Right ?

mmg · June 21, 2021, 3:54am

Refer to this post. The scores are calculate for 1-,2-,3- and 4-gram sequences then (sort of) averaged out