I was watching some very good videos by Aladdin Persson on Youtube, and he shows a simple Sequence-2-Sequence model for machine translation + Teacher Forcing. Now technically I adapted this model for time-series analysis, but the example is fine. The original code is below. The key issues is that due to Teacher Forcing, in the
Seq2Seq layer, the
forward() method takes both the input sentence and the label–meaning the correct answer.
My question is, in the case of actual inference on the model, I won’t have a label. During inference I will only have the input sentence. So when trying to run the model, the model function will expect
model(input, label), and we won’t have any label to provide. So what is the way to deal with that?
Here is the code.
class Seq2Seq(nn.Module): def __init__(self, encoder, decoder): super(Seq2Seq, self).__init__() self.encoder = encoder self.decoder = decoder def forward(self, source, target, teacher_force_ratio=0.5): batch_size = source.shape target_len = target.shape target_vocab_size = len(english.vocab) outputs = torch.zeros(target_len, batch_size, target_vocab_size).to(device) hidden, cell = self.encoder(source) # Grab the first input to the Decoder which will be <SOS> token x = target for t in range(1, target_len): # Use previous hidden, cell as context from encoder at start output, hidden, cell = self.decoder(x, hidden, cell) # Store next output prediction outputs[t] = output # Get the best word the Decoder predicted (index in the vocabulary) best_guess = output.argmax(1) # With probability of teacher_force_ratio we take the actual next word # otherwise we take the word that the Decoder predicted it to be. # Teacher Forcing is used so that the model gets used to seeing # similar inputs at training and testing time, if teacher forcing is 1 # then inputs at test time might be completely different than what the # network is used to. This was a long comment. x = target[t] if random.random() < teacher_force_ratio else best_guess return outputs
As you can see, the
forward() function takes a
source, target, where the source is the input sentence and the target is the actually translated sentence. I have to use the model as below.
model = Seq2Seq(encoder_net, decoder_net).to(device) prediction = model(data, label)
Can anyone explain how to do inference on a Sequence-to-Sequence model, or if there is a better way to train or write these models to deal with teacher forcing, etc. Thanks.