WikiText-2 data not the same as official version


#1

Hi All,
After failing to reproduce the perplexity reported in Github, I realized the official WikiText-2 data (which I used) is different from the data published in the word language examples . The pytorch example’s version differs from official version by replacing names with <unk>, and occasionally replacing some regular words with <unk>. May I ask what are the exact steps/scripts used for this transformation? Or is the official WikiText-2 data changing under the feet?
Thanks!