Hi. I am trying to implement an LSTM-based Encoder-Decoder model for sequence-to-sequence. I took inspiration from fairseq
and built a decoder with conventional embedding-dropouts-recurrent-linear layers. The forward method is like this:
hidden = state.hidden
x = self.embed_tokens(prev_output_tokens)
emb = self.dropout_in_module(x)
x, hidden_t = self.rnn(emb, hidden)
x = self.dropout_out_module(x)
return x, State(hidden=hidden_t)
The prev_output_tokens
is a batch with the target sentences shifted one character to the right and starting with eos_token.
Currently, my generations (I am using greedy search) are very poor. When I started looking into other implementations, I found that there are implementations that more explicitly target the generation step, just like this one in the following excerpt:
for di in range(max_length):
decoder_output, decoder_hidden, decoder_attention = decoder(
decoder_input, decoder_hidden, encoder_outputs)
decoder_attentions[di] = decoder_attention.data
topv, topi = decoder_output.data.topk(1)
if topi.item() == EOS_token:
decoded_words.append('<EOS>')
break
else:
decoded_words.append(output_lang.index2word[topi.item()])
decoder_input = topi.squeeze().detach()
Can someone tell me the best strategy and why mine (the same as fairseq
, I suppose) does not work – or does, but I am doing something wrong. Thanks in advance for any help you can provide.