I’m currently trying to train a mBert with some data from a corpus, I am in the fine-tuning/training part of it all and this is the code I have:
# BERT fine-tuning parameters
param_optimizer = list(model.named_parameters())
no_decay = ['bias', 'gamma', 'beta']
optimizer_grouped_parameters = [
{'params': [p for n, p in param_optimizer if not any(nd in n for nd in no_decay)],
'weight_decay_rate': 0.01},
{'params': [p for n, p in param_optimizer if any(nd in n for nd in no_decay)],
'weight_decay_rate': 0.0}
]
optimizer = AdamW(
model.parameters(), lr=2e-5, correct_bias=False
)
# Function to calculate the accuracy of our predictions vs labels
def flat_accuracy(preds, labels):
pred_flat = np.argmax(preds, axis=1).flatten()
labels_flat = labels.flatten()
return np.sum(pred_flat == labels_flat) / len(labels_flat)
# Store loss and accuracy for plotting
train_loss_set = []
# Number of training epochs
epochs = 4
# BERT training loop
for _ in trange(epochs, desc="Epoch"):
## TRAINING
# Set model to training mode
model.train()
# Tracking variables
tr_loss = 0
nb_tr_examples, nb_tr_steps = 0, 0
# Train the data for one epoch
for step, batch in enumerate(train_dataloader):
# Add batch to GPU
batch = tuple(t.to(device) for t in batch)
# Unpack the inputs from our dataloader
b_input_ids, b_input_mask, b_labels = batch
# Clear out the gradients (by default they accumulate)
optimizer.zero_grad()
# Forward pass
loss = model(b_input_ids, token_type_ids=None, attention_mask=b_input_mask, labels=b_labels)
train_loss_set.append(loss.item())
# Backward pass
loss.backward()
# Update parameters and take a step using the computed gradient
optimizer.step()
# Update tracking variables
tr_loss += loss.item()
nb_tr_examples += b_input_ids.size(0)
nb_tr_steps += 1
print("Train loss: {}".format(tr_loss/nb_tr_steps))
## VALIDATION
# Put model in evaluation mode
model.eval()
# Tracking variables
eval_loss, eval_accuracy = 0, 0
nb_eval_steps, nb_eval_examples = 0, 0
# Evaluate data for one epoch
for batch in validation_dataloader:
# Add batch to GPU
batch = tuple(t.to(device) for t in batch)
# Unpack the inputs from our dataloader
b_input_ids, b_input_mask, b_labels = batch
# Telling the model not to compute or store gradients, saving memory and speeding up validation
with torch.no_grad():
# Forward pass, calculate logit predictions
logits = model(b_input_ids, token_type_ids=None, attention_mask=b_input_mask)
# Move logits and labels to CPU
logits = logits.detach().cpu().numpy()
label_ids = b_labels.to('cpu').numpy()
tmp_eval_accuracy = flat_accuracy(logits, label_ids)
eval_accuracy += tmp_eval_accuracy
nb_eval_steps += 1
print("Validation Accuracy: {}".format(eval_accuracy/nb_eval_steps))
on the line: loss = model(b_input_ids, token_type_ids=None, attention_mask=b_input_mask, labels=b_labels)
I’m getting the error:
ValueError: Expected input batch_size (32) to match target batch_size (4096).
I do get that my problem is a tensor mismatch, what I don’t get is why is that happening.
Before this step the train_dataloader
var is created as such:
train_dataloader = DataLoader(train_data, sampler=train_sampler, batch_size=batch_size)
where:
train_data
is created by doing: train_data = TensorDataset(train_inputs, train_masks, train_labels)
and:
train_masks = torch.tensor(train_masks)
train_labels = torch.tensor(train_labels)
train_inputs = torch.tensor(train_inputs)
all share the size: torch.Size([14505, 128])
The batch_size
is 32
and train_sampler = RandomSampler(train_data)
I’m unsure where this mismatch is happening or why…