How to test recurrent model with packed inputs?

LapoFrati · April 8, 2018, 12:15am

Hi, I’m using an LSTM for image captioning with packed inputs (see the snippet below) and I wondered what is the proper way of testing it. Can I just set the model to eval and feed it the test set or should I write a custom sampling? I’m asking because I’m not sure how information is passed between hidden states when using packed inputs and I was wondering if it does “teacher forcing” (i.e. pass the target instead of the prediction) which would be a problem when testing.

def forward(self, features, tags, lengths):
        embeddings = self.embed(tags)
        embeddings = torch.cat((features.unsqueeze(1), embeddings), 1)
        packed = pack_padded_sequence(embeddings, (lengths).numpy(), batch_first=True) 
        hiddens, _ = self.lstm(packed)
        outputs = self.linear(hiddens[0])
        return outputs

SimonW · April 8, 2018, 7:35am

Vanilla LSTMs doesn’t and cannot do teacher forcing as

LSTMs have much wider use cases than predicting a sequence. So there is no notion of teacher forcing in general.
It doesn’t even see tags as input.

LapoFrati · April 14, 2018, 10:28am

While trying to understand the implementation of LSTM I’ve found this picture:

Judging by it I would say that during the forward pass the inputs to the memory units at time t are tag_t , hiddenstate_t, cellstate_t which makes me think that the prediction at time t+1 is made using the ground truth tag at time t rather than the prediction (I would call this “teacher forcing”).
When I do greedy 1-sampling I iteratively feed the predicted tag, and previous hidden/cell states to the memory unit to predict the next tag.
Since the accuracy I achieve with the sampling is lower than the accuracy of the forward pass I want to make sure I understand the reason, which I’m led to believe is due to ground truth tag vs predicted tag.

Do you think that’s not the case?
I asked because I’ve checked the documentation but I was not sure what the mentioned “input_t” is when LSTMs are used in combination with packed inputs, so I wanted to make sure packed inputs were handled the way I think they are.