RuntimeError: shape '[-1, 2]' is invalid for input of size 9, please help me figure out this error

import torch
import torch.optim as optim

optimizer = optim.AdamW(model.parameters(), lr = 2e-5)  # lr = 0.00002
criterion = torch.nn.CrossEntropyLoss()

num_epochs = 4
for epoch in range(num_epochs):
  optimizer.zero_grad()
  output = model(input_ids, attention_mask = attention_mask, labels = torch.tensor(sentiment_labels))
  loss = output.loss
  loss.backward()
  optimizer.step()

My sentiment lables is 0, 1, 2 with respect to positive, negative, and neutral

sentiment_labels = [sentiments.index(sentiment) for sentiment in sentiments]
print(sentiment_labels)
print(sentiments)

Hi Ravindra,
The code that you have posted most probably does not contain the line that’s producing the error RuntimeError: shape ‘[-1, 2]’ is invalid for input of size 9.
You must be trying to reshape/view a tensor somewhere that’s causing this error. The number of elements in your initial tensor (9) aren’t compatible with the shape that you’re trying to achieve (some rows * 2 columns).

Could you please post the relevant part of the code? I can help debug further.

this is my model

from transformers import DistilBertTokenizer, DistilBertForSequenceClassification
tokenizer = DistilBertTokenizer.from_pretrained('distilbert-base-uncased')
model = DistilBertForSequenceClassification.from_pretrained('distilbert-base-uncased')

This is my basic text data

texts = ['I loved the movie. It was great!',
         'The food was terrible.',
         'The weather is okay.']
sentiments = ['positive', 'negative', 'neutral']

Tokenize the text samples

encoded_texts = tokenizer(texts, padding = True, truncation = True, return_tensors = 'pt')
input_ids = encoded_texts['input_ids']
decoding = tokenizer.decode(input_ids[2])

Attention Mask
attention_mask = encoded_texts['attention_mask']

sentiment_labels = [sentiments.index(sentiment) for sentiment in sentiments]

import torch.nn as nn
num_classes = len(set(sentiment_labels))
classification_head = nn.Linear(model.config.hidden_size, num_classes)
model.classifier = classification_head

import torch
import torch.optim as optim


optimizer = optim.AdamW(model.parameters(), lr = 2e-5)  # lr = 0.00002
criterion = torch.nn.CrossEntropyLoss()

num_epochs = 3
for epoch in range(num_epochs):
  optimizer.zero_grad()
  output = model(input_ids, attention_mask = attention_mask, labels = torch.tensor(sentiment_labels))
  loss = output.loss
  loss.backward()
  optimizer.step()

The above code is simple but I didn’t able to figure out where I am doing mistake.
Thank you @srishti-git1110

Thanks for the code!
I looked into hugging face’s source code and found that it’s indeed a view operation that’s causing this error. Specifically, it is -

loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1))

This line is present in the forward method and hence self (the currently calling object) refers to the model. If you further try to do print(model.num_labels), it indeed gives 2 rather than 3 which explains the 2 in your error - RuntimeError: shape ‘[-1, 2]’ is invalid for input of size 9.

To get rid of this, you basically need to change only one line of code which is:

model = DistilBertForSequenceClassification.from_pretrained('distilbert-base-uncased', num_labels=3)