ValueError: Expected input batch_size (32) to match target batch_size (4096)

I’m currently trying to train a mBert with some data from a corpus, I am in the fine-tuning/training part of it all and this is the code I have:

# BERT fine-tuning parameters
param_optimizer = list(model.named_parameters())
no_decay = ['bias', 'gamma', 'beta']
optimizer_grouped_parameters = [
    {'params': [p for n, p in param_optimizer if not any(nd in n for nd in no_decay)],
     'weight_decay_rate': 0.01},
    {'params': [p for n, p in param_optimizer if any(nd in n for nd in no_decay)],
     'weight_decay_rate': 0.0}
]

optimizer = AdamW(
    model.parameters(), lr=2e-5, correct_bias=False
) 

# Function to calculate the accuracy of our predictions vs labels
def flat_accuracy(preds, labels):
    pred_flat = np.argmax(preds, axis=1).flatten()
    labels_flat = labels.flatten()
    return np.sum(pred_flat == labels_flat) / len(labels_flat)
  
# Store loss and accuracy for plotting
train_loss_set = []
# Number of training epochs 
epochs = 4

# BERT training loop
for _ in trange(epochs, desc="Epoch"):  
  
  ## TRAINING
  
  # Set model to training mode
  model.train()  
  # Tracking variables
  tr_loss = 0
  nb_tr_examples, nb_tr_steps = 0, 0
  # Train the data for one epoch
  for step, batch in enumerate(train_dataloader):
    # Add batch to GPU
    batch = tuple(t.to(device) for t in batch)
    # Unpack the inputs from our dataloader
    b_input_ids, b_input_mask, b_labels = batch
    # Clear out the gradients (by default they accumulate)
    optimizer.zero_grad()
    # Forward pass
    loss = model(b_input_ids, token_type_ids=None, attention_mask=b_input_mask, labels=b_labels)
    train_loss_set.append(loss.item())    
    # Backward pass
    loss.backward()
    # Update parameters and take a step using the computed gradient
    optimizer.step()
    # Update tracking variables
    tr_loss += loss.item()
    nb_tr_examples += b_input_ids.size(0)
    nb_tr_steps += 1
  print("Train loss: {}".format(tr_loss/nb_tr_steps))
       
  ## VALIDATION

  # Put model in evaluation mode
  model.eval()
  # Tracking variables 
  eval_loss, eval_accuracy = 0, 0
  nb_eval_steps, nb_eval_examples = 0, 0
  # Evaluate data for one epoch
  for batch in validation_dataloader:
    # Add batch to GPU
    batch = tuple(t.to(device) for t in batch)
    # Unpack the inputs from our dataloader
    b_input_ids, b_input_mask, b_labels = batch
    # Telling the model not to compute or store gradients, saving memory and speeding up validation
    with torch.no_grad():
      # Forward pass, calculate logit predictions
      logits = model(b_input_ids, token_type_ids=None, attention_mask=b_input_mask)    
    # Move logits and labels to CPU
    logits = logits.detach().cpu().numpy()
    label_ids = b_labels.to('cpu').numpy()
    tmp_eval_accuracy = flat_accuracy(logits, label_ids)    
    eval_accuracy += tmp_eval_accuracy
    nb_eval_steps += 1
  print("Validation Accuracy: {}".format(eval_accuracy/nb_eval_steps))

on the line: loss = model(b_input_ids, token_type_ids=None, attention_mask=b_input_mask, labels=b_labels)

I’m getting the error:

ValueError: Expected input batch_size (32) to match target batch_size (4096).

I do get that my problem is a tensor mismatch, what I don’t get is why is that happening.

Before this step the train_dataloader var is created as such:

train_dataloader = DataLoader(train_data, sampler=train_sampler, batch_size=batch_size)

where:

train_data is created by doing: train_data = TensorDataset(train_inputs, train_masks, train_labels)

and:

train_masks = torch.tensor(train_masks)
train_labels = torch.tensor(train_labels)
train_inputs = torch.tensor(train_inputs)

all share the size: torch.Size([14505, 128])

The batch_size is 32 and train_sampler = RandomSampler(train_data)

I’m unsure where this mismatch is happening or why…

I don’t know how your model calculates the loss, but given that the input has a batch size of 32 as expected, I would check if the target is flattened somewhere or if it is already in a wrong shape when returned by the DataLoader.

do you mean verify these?

train_data = TensorDataset(train_inputs, train_masks, train_labels)

train_dataloader = DataLoader(train_data, sampler=train_sampler, shuffle=True, batch_size=batch_size)

validation_dataloader = DataLoader(validation_data, sampler=validation_sampler, shuffle=True, batch_size=batch_size)

Also, would you mind letting me know where I should check for/do the flattening? I am certain that’s not being done.

Yes, I would check the shape of the data and target via:

data, target = next(iter(train_dataloader))

and make sure they both have the expected batch size of 32.
If that’s the case, check the model’s forward method and isolate where the loss is calculated as it seems your model is responsible for the loss calculation as well.

I will be doing that but I gotta ask, if I replace all that by just doing:

# set up training arguments for trainer, leave most at default
training_args = TrainingArguments(
    output_dir='./results',  # output directory
    evaluation_strategy="epoch",
    num_train_epochs=5,  # total # of training epochs
)

# set up trainer
trainer = Trainer(
    model=model,  # the instantiated 🤗 Transformers model to be trained
    args=training_args,  # training arguments, defined above
    train_dataset= train_inputs,  # training dataset
    eval_dataset= validation_inputs
)

trainer.train()

am I losing anything? When I try that I get a different error:

ValueError: You have to specify either input_ids or inputs_embeds

But as far as I can tell, it should do the same thing I was attempting to do just using the huggingface trainer API or am I mistaken? Sorry if it’s a random or newbish question, I’m new to ML

I don’t know as I’m not familiar enough with the HuggingFace Trainer interface.
Based on the new error I assume you are missing needed arguments.