BertForMultipleChoice works without captum, breaks with it?

Hi, I’m trying to do first experiments with captum, with a pretrained BertForMultipleChoice model over this published Caseholder dataset, and following this BERT SQUAD Captum tutorial. I’ve posted my notebook on Colab if you’re kind enough to look!

My first issue seems to be identifying what “ground truth” means. As this is a multiple choice task, I figured the correct answer would be ground truth? But this fails trying to get the indices of the GT tokens: ground_truth_end_ind = indices.index(ground_truth_tokens[-1])

ValueError: 7607 is not in list

Question#1: What should ground truth be for this dataset?

To get a bit farther I just reused the same text for ground truth. But this fails when it tries to make a prediction:

start_scores, end_scores = predict(input_ids, \
                               token_type_ids=token_type_ids, \
                               position_ids=position_ids, \

I traced this error to modeling_bert.BertForMultipleChoice.forward() (cf. lines 1662-1690).

In previous experiments without captum the shape of input_ids comes in as [16,5,128], input_ids.size(-1) = 128, num_choices = 5, and input_ids shape is changed to [80,128]. Then pooled_output.shape=[80, 128], logits.shape=[80,1], and reshaped_logits is computed correctly, with shape=[16,5].

But using captum, input_ids shape is unchanged num_choices = 260. Then pooled_output.shape=[1,768], logits.shape=[1,1], and the attempt to compute reshaped_logits fails with error:

RuntimeError: shape '[-1, 260]' is invalid for input of size 1

Question#2: Why is BertForMultipleChoice.forward() behaving differently with captum?

Thanks for any help. captum looks hugely useful, can’t wait to make use of it!