Transformer LM in Pytorch Documentation performs poorly

Hari_Krishnan · September 29, 2020, 6:10am

I trained a Transformer based Language Model using code available in PyTorch documentation, the results are worse than a RNN based LM. The code I’m using is from this part of documentation https://pytorch.org/tutorials/beginner/transformer_tutorial.html.

Here are some output examples

<eos> @ settlement heavy of , , lined the she <unk> of . interception the dried . , would his 
= . , losses the and and the <unk> was , the the , <unk> <unk> the and be first 

= 1 was rains ireland and starting with hairy had found the <unk> to possibility heads other which receive gift 
= @ the , , the in the , been in <unk> , the of , , are the , 

valkyria rebounds rapid over . truely their lead <unk> recently at year sharif take of as symptoms usually an for 
chronicles , and the the , first to , been the , , a the the , <unk> average the 

chronicles , in the since and drive were leaves left the , , a carpenter ear may happen index seizing 
iii @ the storm the the , the the the <unk> and and touchdown , , be , , the 

iii 3 particular island the son at built are the site he the 56 becoming ornaments include every @-@ on 
, @ by , republic , the in the <unk> , was <unk> @-@ the , <unk> year <unk> the 

= @ , , 1960s <unk> their during crowded <unk> . was cambridge – a . <unk> few linked telling 
= . the the , , first the , , the a , 0 <unk> the , years to the

You can see that text generated doesn’t even remotely make sense and is riddled with grammatical mistakes.

Someone else has posted the same issue in the forums and is currently unanswered. Here’s the link for that Transformer model of Language Model Example performs worse much

Theoretically, wouldn’t transformers produce better results than RNN? What is the cause for this.