Is ignore_index necessary with the pack_padded_sequence

When coping with variable lengths of the input sequence, if we use pack_padded_sequence, do we still need to set the ignore_index parameter of the loss function to get rid of calculating the gradient of the padding element?

For example, in the image captioning PyTorch tutorial, the forward method of the DecoderRNN is

        embeddings = self.embed(captions)
        embeddings =, embeddings), 1)
        packed = pack_padded_sequence(embeddings, lengths, batch_first=True) 
        hiddens, _ = self.lstm(packed)
        outputs = self.linear(hiddens[0])

But during the training, it directly use the

            # Set mini-batch dataset
            images =
            captions =
            targets = pack_padded_sequence(captions, lengths, batch_first=True)[0]
            # Forward, backward and optimize
            features = encoder(images)
            outputs = decoder(features, captions, lengths)
            loss = criterion(outputs, targets)

My questions are:

Q1: If we use pack_padded_sequence to pack the sequence with padding, it seems that it is not necessary to set ignore_index in the loss function. However, if we then use pad_packed_sequence to unpack the result returned by RNN, do we need to set following ignore_index in the loss function?

Q2: How can I get the answer of Q1 by writing a toy program, such as printing the gradients of each element? Which element should I print out if I want to figure out whether the gradient of the padding element BP or not?

Thank you so much!

Any updates on this?

Any updates on this?