Prediction function in an NLP classification problem without using MLM

I am working in an NLP classification task using Tranadormer. I have created a function which predicts the output of the testing data which is working well when I am using MLM. However, when I am not using MLM in the training process I have the following error.

The prediction function is

# Prediction function with MLM

import torch.nn.functional as F

def bert_predict(model, test_dataloader):
    """
    Perform a forward pass on the trained BERT model to predict probabilities
    on the test set.
    """
    # Put the model into the evaluation mode. The dropout layers are disabled during
    # the test time.
    model.eval()

    all_logits = []

    # For each batch in our test set...
    for batch in test_dataloader:
        # Load batch to GPU
        b_input_ids, b_attn_mask = tuple(t.to(device) for t in batch)[:2]

        # Compute logits
        with torch.no_grad():
            logits = model(b_input_ids, b_attn_mask)
        all_logits.append(logits)
    
    # Concatenate logits from each batch
    all_logits = torch.cat(all_logits, dim=0)

    # Apply softmax to calculate probabilities
    probs = F.softmax(all_logits, dim=1).cpu().numpy()

    return probs
# Prediction function without MLM

import torch.nn.functional as F

def bert_predict(model, test_dataloader):
    """
    Perform a forward pass on the trained BERT model to predict probabilities
    on the test set.
    """
    # Put the model into the evaluation mode. The dropout layers are disabled during
    # the test time.
    model.eval()

    all_logits = []

    # For each batch in our test set...
    for batch in test_dataloader:
        # Load batch to GPU
        b_input_ids = tuple(t.to(device) for t in batch)[:1]

        # Compute logits
        with torch.no_grad():
            logits = model(b_input_ids)
        all_logits.append(logits)
    
    # Concatenate logits from each batch
    all_logits = torch.cat(all_logits, dim=0)

    # Apply softmax to calculate probabilities
    probs = F.softmax(all_logits, dim=1).cpu().numpy()

    return probs

The test dataload is

# Create the DataLoader for our testing set
test_data = TensorDataset(test_inputs, test_masks, test_labels)
test_sampler = SequentialSampler(test_data)
test_dataloader = DataLoader(test_data, sampler=test_sampler, batch_size=batch_size)

The model is the following which is finetuned as well

# Instantiate Bert Classifier
bert_classifier = BertClassifier(freeze_bert=False)

# Tell PyTorch to run the model on GPU
bert_classifier.to(device)

THE ERROR IS THE FOLLOWING:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-67-5f46deb2534d> in <module>()
      1 # Compute predicted probabilities on the test set
----> 2 probs = bert_predict(bert_classifier, test_dataloader)

3 frames
<timed exec> in forward(self, input_ids)

/usr/local/lib/python3.7/dist-packages/transformers/models/bert/modeling_bert.py in forward(self, input_ids, attention_mask, token_type_ids, position_ids, head_mask, inputs_embeds, encoder_hidden_states, encoder_attention_mask, past_key_values, use_cache, output_attentions, output_hidden_states, return_dict)
    942             raise ValueError("You cannot specify both input_ids and inputs_embeds at the same time")
    943         elif input_ids is not None:
--> 944             input_shape = input_ids.size()
    945         elif inputs_embeds is not None:
    946             input_shape = inputs_embeds.size()[:-1]

AttributeError: 'tuple' object has no attribute 'size'

It seems you are explicitly creating a tuple here:

b_input_ids = tuple(t.to(device) for t in batch)[:1]

and pass it to the model:

logits = model(b_input_ids)

while a tensor might be expected:

--> 944             input_shape = input_ids.size()
    945         elif inputs_embeds is not None:
    946             input_shape = inputs_embeds.size()[:-1]

AttributeError: 'tuple' object has no attribute 'size'

I don’t know what MLM refers to, but do you know if it processes the tuple input somehow and passes tensors to the model?