RuntimeError: Expected target size [2, 30000], got [2]

I know this has been asked before: ValueError: Expected target size (32, 7), got torch.Size([32]) but this didn’t solve my problem.

when calling “torch.nn.CrossEntropyLoss()”.
I get “RuntimeError: Expected target size [2, 30000], got [2]”

I saw in that post a comment where it’s explained that : " nn.CrossEntrolyLoss expects a model output in the shape [batch_size, nb_classes, *additional_dims] and a target in the shape [batch_size, *additional_dims]". If that’s the case, I’m not sure why my targets/labels dimensions are only the batch size which is [2].

This is the model, I have 26 output classes:

from transformers import AutoTokenizer, AutoModelForMaskedLM
model = AutoModelForMaskedLM.from_pretrained("asafaya/albert-base-arabic")
hidden_size = model.config.hidden_size
model.classifier = torch.nn.Linear(in_features=hidden_size, out_features=26)

I get the error while training, when calling CrossEntropyLoss:

model.to(device)
# Set the model to train mode
model.train()

#46416/64 = 725.66
#num_training_steps set to 725 * 50 = 36283
#num_warmup_steps 36283 * 0.1 = 3628
num_warmup_steps = 3628
num_training_steps = 36283
num_epochs = 50
# Define the optimizer and the scheduler
optimizer = AdamW(model.parameters(), lr=0.001)
scheduler = get_linear_schedule_with_warmup(optimizer, num_warmup_steps=num_warmup_steps, num_training_steps=num_training_steps)

# Define the loss function
loss_fn = torch.nn.CrossEntropyLoss()

# Fine-tune the model
for epoch in range(num_epochs):
    for step, batch in enumerate(train_dataloader):
        # Unpack the batch
        input_ids, attention_masks, labels = batch
        
        input_ids = input_ids.to(device)
        attention_masks = attention_masks.to(device)

        # Forward pass
        logits = model(input_ids, attention_masks).logits
       

        # Compute the loss (Here is where the error happens)
        loss = loss_fn(logits, labels)

        # Backward pass and optimization
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        scheduler.step()

logits shape is: torch.Size([2, 256, 30000])
labels is: torch.Size([2])

Thank you!

Double post from here with follow-up.

Thank you! Should I leave it for now, as reference for you?

That’s correct so you would need to check why your model output contains the additional dimension:

since the 30000 indicate e.g. a temporal dimension which is missing in the target.

I know that the 2 is the batch size in both logits, and labels. 30000 seems to be the vocab_size based on the mode.config output.

model.config

output:

AlbertConfig {
  "_name_or_path": "asafaya/albert-base-arabic",
  "architectures": [
    "AlbertForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0,
  "bos_token_id": 2,
  "classifier_dropout_prob": 0.1,
  "down_scale_factor": 1,
  "embedding_size": 128,
  "eos_token_id": 3,
  "gap_size": 0,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "inner_group_num": 1,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "albert",
  "net_structure_type": 0,
  "num_attention_heads": 12,
  "num_hidden_groups": 1,
  "num_hidden_layers": 12,
  "num_memory_blocks": 0,
  "pad_token_id": 0,
  "position_embedding_type": "absolute",
  "transformers_version": "4.26.0.dev0",
  "type_vocab_size": 2,
  **"vocab_size": 30000**
}```

Is the vocab size representing the number of classes? If so what does the 256 represent?