ValueError: Expected input batch_size (324) to match target batch_size (4)

Hi,
I am facing issue. pls help
Labels shape: torch.Size([4])
inputs shape: torch.Size([4, 512])

ValueError Traceback (most recent call last)
in <cell line: 14>()
20 print(“Labels shape:”, labels.shape)
21 print(“inputs shape:”, inputs.shape)
—> 22 outputs = model(inputs, attention_mask=attention_mask, labels=labels)
23 loss = outputs.loss # Use the loss directly from the model’s output
24 loss.backward()

9 frames
/usr/local/lib/python3.10/dist-packages/torch/nn/functional.py in cross_entropy(input, target, weight, size_average, ignore_index, reduce, reduction, label_smoothing)
3057 if size_average is not None or reduce is not None:
3058 reduction = _Reduction.legacy_get_string(size_average, reduce)
→ 3059 return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index, label_smoothing)
3060
3061

ValueError: Expected input batch_size (2048) to match target batch_size (4).

code

class NERDataset(Dataset):
def init(self, texts, labels, tokenizer,label_map):
self.texts = texts
self.labels = labels
self.tokenizer = tokenizer
self.label_map=label_map

def __len__(self):
    return len(self.texts)

def encode_label_to_id(self,label):
  return self.label_map[label]

def __getitem__(self, idx):
  text = self.texts[idx]
  label = self.labels[idx]
  encoding = self.tokenizer(text, truncation=True, padding='max_length', return_tensors='pt', max_length=512)
  # Tokenize the text and convert to input features
  # Extract input_ids and attention_mask
  input_ids = encoding['input_ids'].flatten()
  attention_mask = encoding['attention_mask'].flatten()
  label_id =self.encode_label_to_id(label)

  return {
      'input_ids': input_ids,
      'attention_mask': attention_mask,
      'labels': torch.tensor(label_id, dtype=torch.long) # Use the encoded label
  }

train_dataset = NERDataset(train_texts, train_labels, tokenizer,label_map)

val_dataset = NERDataset(val_texts, val_labels, tokenizer,label_map)
batch_size = 4 # Adjust as needed

train_dataloader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)

val_dataloader = DataLoader(val_dataset, batch_size=batch_size)
import torch
from transformers import AdamW
from transformers import get_linear_schedule_with_warmup
import numpy as np

optimizer = AdamW(model.parameters(), lr=5e-5) # Example learning rate, adjust as needed
scheduler = get_linear_schedule_with_warmup(optimizer, num_warmup_steps=0, num_training_steps=len(train_dataset))

Define your loss function

loss_fn = torch.nn.CrossEntropyLoss()

Fine-tune the model

model.train()
for epoch in range(1):
for batch in train_dataloader:
optimizer.zero_grad()
inputs = batch[‘input_ids’]
attention_mask = batch[‘attention_mask’]
labels = batch[‘labels’]
print(“Labels shape:”, labels.shape)
print(“inputs shape:”, inputs.shape)
outputs = model(inputs, attention_mask=attention_mask, labels=labels)
loss = outputs.loss # Use the loss directly from the model’s output
loss.backward()
optimizer.step()
scheduler.step()

Evaluate the fine-tuned model

model.eval()
for batch in val_dataloader:
inputs, labels = batch
with torch.no_grad():
outputs = model(**inputs)

It seems you are flattening the output at one point, which causes the shape mismatch.
Your code is neither properly formatted nor executable so check your model definition and isolate where the forward activations are reshaped to a flat tensor.

I’m encountering a ValueError during the training phase of my Sequence classification labeling model using PyTorch. I am doing vectorization using word2vec and feeding the numerical data into RNN model.
The error suggests a mismatch between the input batch size and the target batch size. Here’s the section of the code where the error occurs:

Running the training loops

N_EPOCHS = 3 # Number of epochs

best_valid_loss = float(‘inf’)
print(f"Using {‘GPU’ if str(DEVICE) == ‘cuda’ else ‘CPU’} for training.")

for epoch in range(N_EPOCHS):
start_time = time.time() # Start time of the epoch

train_loss, train_acc = train(model, train_loader, optimizer, criterion, DEVICE)
print(f'\tTrain Loss: {train_loss:.3f} | Train Acc: {train_acc*100:.2f}%')

valid_loss, valid_acc = evaluate(model, valid_loader, criterion, DEVICE)
print(f'\t Val. Loss: {valid_loss:.3f} |  Val. Acc: {valid_acc*100:.2f}%')

end_time = time.time()  # End time of the epoch

epoch_mins, epoch_secs = epoch_time(start_time, end_time)  # Calculate epoch duration

print(f'Epoch: {epoch+1} | Epoch Time: {epoch_mins}m {epoch_secs}s')
print(f'\tTrain Loss: {train_loss:.3f}, Train Acc: {train_acc*100:.2f}%')
print(f'\tValid Loss: {valid_loss:.3f}, Valid Acc: {valid_acc*100:.2f}%')

if valid_loss < best_valid_loss:
    best_valid_loss = valid_loss
    torch.save(model.state_dict(), 'rnn_model.pt')

I have tried so many things but I can’t get rid of the issue. Please help!

@ptrblck

Your current code snippet does not show anything interesting as I assume the error is raised in the train method. Try to come up with a minimal and executable code snippet reproducing the issue.

@ptrblck Thanks for the prompt response!
I will try to write minimal code (sorry it is a bit long coz everything is linked) related to the issue.

My optimizer and loss function is:

optimizer = optim.SGD(model.parameters(), lr=1e-3)
criterion = nn.CrossEntropyLoss(ignore_index=-100)

I did some manual prints (during epoch stage) to understand the error:

Batch shapes - Vectors: torch.Size([1, 300]), Lengths: torch.Size([1]), Labels: torch.Size([1, 61])
Before Forward Pass - Input Vectors Shape: torch.Size([1, 300]), Input Lengths Shape: torch.Size([1])
After Forward Pass - Predictions Shape: torch.Size([1, 300, 4])
Before Loss Calculation - Predictions Shape: torch.Size([300, 4]), Labels Shape: torch.Size([61])

My Error: ValueError: Expected input batch_size (300) to match target batch_size (61).

Relevant code

class TextDataset(Dataset):
    def __init__(self, vectors, labels):
      self.vectors = vectors
      self.labels = labels

    def __len__(self):
      return len(self.vectors)

    def __getitem__(self, idx):
      return self.vectors[idx], self.labels[idx]
        # vector = self.vectors[idx]
        # label = self.labels[idx]
        # return torch.tensor(vector, dtype=torch.float), torch.tensor(label, dtype=torch.long) # Converting vec,label to tensors

train_vectors = df_train['sentence_vectors'].tolist()
train_dataset = TextDataset(train_vectors, train_labels)

def collate_fn(batch):
    vectors, labels = zip(*batch)
    vectors_padded = pad_sequence([torch.tensor(vec, dtype=torch.float) for vec in vectors], batch_first=True, padding_value=0)
    labels_padded = pad_sequence([torch.tensor(label, dtype=torch.long) for label in labels], batch_first=True, padding_value=-100) 

    lengths = torch.tensor([len(vec) for vec in vectors])

# Then I created dataloaders for each split. Just writing one here 
batch_size = 1
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True, collate_fn=collate_fn, num_workers=2)
    return vectors_padded, lengths, labels_padded

class RNN(nn.Module):
    def __init__(self, input_dim, embedding_dim, hidden_dim, output_dim, pretrained_embeddings):
        super().__init__()
        self.embedding = nn.Embedding(input_dim, embedding_dim)
        self.embedding.weight.data.copy_(pretrained_embeddings)  # Copy Word2Vec embeddings
        self.embedding.weight.requires_grad = False  # Optionally freeze embeddings
        self.rnn = nn.RNN(embedding_dim, hidden_dim, batch_first=True)
        self.fc = nn.Linear(hidden_dim, output_dim)  # Multiple classes (4)

    def forward(self, texts, lengths):
        texts = texts.long()  # Convert to LongTensor
        embedded = self.embedding(texts)
        packed_embedded = nn.utils.rnn.pack_padded_sequence(embedded, lengths.cpu(), batch_first=True, enforce_sorted=False)
        packed_output, _ = self.rnn(packed_embedded)
        output, output_lengths = nn.utils.rnn.pad_packed_sequence(packed_output, batch_first=True)
        # Apply the fully connected layer to each time step output
        output = self.fc(output)
        return output

Lastly this is my model

INPUT_DIM = len(word2vec_model.wv) # Vocabsize
EMBEDDING_DIM = num_features
HIDDEN_DIM = 256
OUTPUT_DIM = 4 # 4 labels 
PRETRAINED_EMBEDDINGS = torch.FloatTensor(word2vec_model.wv.vectors) # Convert Word2Vec embeddings to Torch tensor
model = RNN(INPUT_DIM, EMBEDDING_DIM, HIDDEN_DIM, OUTPUT_DIM, PRETRAINED_EMBEDDINGS)

ValueError: Expected input batch_size (6128) to match target batch_size (1904)

Can you help me with this issue,

class QAModel(L.LightningModule):
  def __init__(self, model_id, config):                                                 
    super().__init__()
    self.model = None
    if(config):
      self.model = AutoModelForCausalLM.from_pretrained("4Ashwin/phi-2-medquad", trust_remote_code=True)
    self.config = config

  def forward(self, input_ids, attention_mask, labels=None):
    output = self.model(input_ids=input_ids, attention_mask=attention_mask, labels=labels)
    return output.loss, output.logits

  def training_step(self, batch, batch_idx):
    input_ids = batch["input_ids"]
    attention_mask = batch["attention_mask"]
    labels = batch["labels"]
    print("Training Step: Input Shape - input_ids:", input_ids.shape, "attention_mask:", attention_mask.shape, "labels:", labels.shape)
    loss, outputs = self(input_ids, attention_mask, labels)
    self.log("train_loss", loss, prog_bar=True, logger=True)
    return {"loss": loss, "predictions":outputs, "labels": labels}

  def validation_step(self, batch, batch_idx):
    input_ids = batch["input_ids"]
    attention_mask = batch["attention_mask"]
    labels = batch["labels"]
    print("Validation Step : Input Shape - input_ids:", input_ids.shape, "attention_mask:", attention_mask.shape, "labels:", labels.shape)
    loss, outputs = self(input_ids, attention_mask, labels)
    self.log("val_loss", loss, prog_bar=True, logger=True)
    return loss

  def test_step(self, batch, batch_idx):
    input_ids = batch["input_ids"]
    attention_mask = batch["attention_mask"]
    labels = batch["labels"]
    print("Test Step -  Input Shape - input_ids:", input_ids.shape, "attention_mask:", attention_mask.shape, "labels:", labels.shape)
    loss, outputs = self(input_ids, attention_mask, labels)
    self.log("test_loss", loss, prog_bar=True, logger=True)
    return loss

  def configure_optimizers(self):
    return AdamW(self.parameters(), lr=0.0001)


model = QAModel(model_id, config)

# Check model's final layer
final_layer = list(model.model.children())[-1]
in_features = final_layer.in_features
out_features = final_layer.out_features
print("hellooooo",in_features,out_features)


logger = TensorBoardLogger("lightning_logs", name="medqa-model")

def get_callback(filename="medqa-model"):
    checkpoint_callback = ModelCheckpoint(
        dirpath="checkpoints",
        filename=filename,
        save_top_k=1,
        verbose=True,
        monitor="val_loss",
        mode="min"
    )
    return checkpoint_callback

checkpoint_callback = get_callback("best-checkpoint-teacher")
trainer = L.Trainer(
    logger=logger,
    callbacks=[checkpoint_callback],
    num_nodes=1,
    max_epochs=1
)
trainer.fit(model, data_module)

output contains:
hellooooo 2560 51200
Validation Step : Input Shape - input_ids: torch.Size([16, 384]) attention_mask: torch.Size([16, 384]) labels: torch.Size([16, 120])

I am trying to apply knowledge distillation on my finetuned language model,
the i got the KD code from online of a Huggingface model t5-small, with dataset SQuAD,
while i am trying to apply it another model and have data_module for corresponding custom data.

It’s hard to tell what exactly is causing the shape mismatch as the model’s internals are unknown. You might want to check make sure the batch sizes of the input and target are the same. If so, check the stacktrace and based on it try to narrow down which layer or operation fails. Once isolated check if any previous operations change the batch size.