Proper way to implement autoregressive decoder in Seq2Seq tasks

Hi. I am trying to implement an LSTM-based Encoder-Decoder model for sequence-to-sequence. I took inspiration from fairseq and built a decoder with conventional embedding-dropouts-recurrent-linear layers. The forward method is like this:

hidden = state.hidden
x = self.embed_tokens(prev_output_tokens)
emb = self.dropout_in_module(x)
x, hidden_t = self.rnn(emb, hidden)
x = self.dropout_out_module(x)
return x, State(hidden=hidden_t)

The prev_output_tokens is a batch with the target sentences shifted one character to the right and starting with eos_token.
Currently, my generations (I am using greedy search) are very poor. When I started looking into other implementations, I found that there are implementations that more explicitly target the generation step, just like this one in the following excerpt:

for di in range(max_length):
     decoder_output, decoder_hidden, decoder_attention = decoder(
        decoder_input, decoder_hidden, encoder_outputs)
     decoder_attentions[di] = decoder_attention.data
     topv, topi = decoder_output.data.topk(1)
     if topi.item() == EOS_token:
          decoded_words.append('<EOS>')
          break
     else:
          decoded_words.append(output_lang.index2word[topi.item()])

     decoder_input = topi.squeeze().detach()

Can someone tell me the best strategy and why mine (the same as fairseq, I suppose) does not work – or does, but I am doing something wrong. Thanks in advance for any help you can provide.