I was watching some very good videos by Aladdin Persson on Youtube, and he shows a simple Sequence-2-Sequence model for machine translation + Teacher Forcing. Now technically I adapted this model for time-series analysis, but the example is fine. The original code is below. The key issues is that due to Teacher Forcing, in the Seq2Seq
layer, the forward()
method takes both the input sentence and the label–meaning the correct answer.
My question is, in the case of actual inference on the model, I won’t have a label. During inference I will only have the input sentence. So when trying to run the model, the model function will expect model(input, label)
, and we won’t have any label to provide. So what is the way to deal with that?
Here is the code.
class Seq2Seq(nn.Module):
def __init__(self, encoder, decoder):
super(Seq2Seq, self).__init__()
self.encoder = encoder
self.decoder = decoder
def forward(self, source, target, teacher_force_ratio=0.5):
batch_size = source.shape[1]
target_len = target.shape[0]
target_vocab_size = len(english.vocab)
outputs = torch.zeros(target_len, batch_size, target_vocab_size).to(device)
hidden, cell = self.encoder(source)
# Grab the first input to the Decoder which will be <SOS> token
x = target[0]
for t in range(1, target_len):
# Use previous hidden, cell as context from encoder at start
output, hidden, cell = self.decoder(x, hidden, cell)
# Store next output prediction
outputs[t] = output
# Get the best word the Decoder predicted (index in the vocabulary)
best_guess = output.argmax(1)
# With probability of teacher_force_ratio we take the actual next word
# otherwise we take the word that the Decoder predicted it to be.
# Teacher Forcing is used so that the model gets used to seeing
# similar inputs at training and testing time, if teacher forcing is 1
# then inputs at test time might be completely different than what the
# network is used to. This was a long comment.
x = target[t] if random.random() < teacher_force_ratio else best_guess
return outputs
As you can see, the forward()
function takes a source, target
, where the source is the input sentence and the target is the actually translated sentence. I have to use the model as below.
model = Seq2Seq(encoder_net, decoder_net).to(device)
prediction = model(data, label)
Can anyone explain how to do inference on a Sequence-to-Sequence model, or if there is a better way to train or write these models to deal with teacher forcing, etc. Thanks.