Based on the posted shapes your output has the shape [batch_size=100, nb_classes=28, seq_len=4], which requires the target to have the shape [batch_size=100, seq_len=4].
However, based on your model architecture I assume you want to use 4 classes in the prediction.
Since your output is 3-dimensional: are you using a sequence of samples or is the input reshaped in a wrong way?

You’re ignoring one of the dims of your images. If you must put it through linear layers, try flattening your tensor before putting it into the model. This will require changing your layer sizes, in particular, the first layer.

Hello there. I have seen a lot of others with this issue. Any advice would be appreciated. I am trying to train a GPT-2 model to take in a tokenized/padded input and predict the output. My batch size is 32. My max length is 343. I believe that the 768 comes from the model. I cannot get the loss function to work properly though. The training loop keeps throwing me errors like this:

# Define the model architecture
model = transformers.GPT2Model.from_pretrained('gpt2').to(device)
# Define the loss function
loss_function = nn.CrossEntropyLoss(ignore_index=0, reduction='mean')
# Define the optimizer
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
# Set the model to training mode
model.train()
print(f"input_tensors.shape before the loop: {input_tensors.shape}")
print(f"output_tensors.shape before the loop: {output_tensors.shape}")
# Loop over the number of epochs
for epoch in range(num_epochs):
# Initialize the epoch loss
epoch_loss = 0
# Loop over the data in the dataloader
for input_tensors, output_tensors in dataloader:
# Send the input and target tensors to the device
input_tensors = input_tensors.to(device)
output_tensors = output_tensors.to(device)
# Zero gradients
optimizer.zero_grad()
# Begin Forward pass
logits = model(input_tensors)[0]
print(f"logits.shape: {logits.shape}")
print(f"input_tensors.shape: {input_tensors.shape}")
print(f"output_tensors.shape: {output_tensors.shape}")
# Compute the loss
loss = loss_function(logits, output_tensors)
# Backward pass
loss.backward()
# Update the model parameters
optimizer.step()
# Add the loss to the epoch loss
epoch_loss += loss.item()
# Print the epoch loss
print(f'Epoch {epoch+1}: Loss = {epoch_loss}')

The corresponding sizes:

input_tensors.shape before the loop: torch.Size([2625, 343])
output_tensors.shape before the loop: torch.Size([2625, 343])
logits.shape: torch.Size([32, 343, 768])
input_tensors.shape: torch.Size([32, 343])
output_tensors.shape: torch.Size([32, 343])

You are running into the same issue as described in my previous post. nn.CrossEntropyLoss expects logits in the shape [batch_size, nb_classes, *] and targets in the shape [batch_size, *] containing class indices in the range [0, nb_classes-1] where * denotes additional dimensions.
Your current logits in the shape [32, 343, 768] correspond to batch_size=32, nb_classes=343, seq_len=768 which doesn’t match the expected target shape.
Assuming you are dealing with 768 classes .permute the model output to have the shape [32, 768, 343] via logits = logits.permute(0, 2, 1) and it should work.

Hi. Thanks for the response! I think that solution brings one step closer to training the model. Here is my new error. It bounces from 4868 to 1330 to 1640 etc… randomly. So I posted 2 examples.

return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index, label_smoothing)
IndexError: Target 4868 is out of bounds.

IndexError: Target 1330 is out of bounds.

Updated code:

# Loop over the number of epochs
for epoch in range(num_epochs):
# Initialize the epoch loss
epoch_loss = 0
# Loop over the data in the dataloader
for input_tensors, output_tensors in dataloader:
# Send the input and target tensors to the device
input_tensors = input_tensors.to(device)
output_tensors = output_tensors.type(torch.LongTensor)
output_tensors = output_tensors.to(device)
# Zero gradients
optimizer.zero_grad()
# Begin Forward pass
logits = model(input_tensors)[0]
print(f"logits.shape: {logits.shape}")
print(f"input_tensors.shape: {input_tensors.shape}")
print(f"output_tensors.shape: {output_tensors.shape}")
logits = logits.permute(0, 2, 1)
# Compute the loss
print(f"logits.shape after permute: {logits.shape}")
loss = loss_function(logits, output_tensors)
# Backward pass
loss.backward()
# Update the model parameters
optimizer.step()
# Add the loss to the epoch loss
epoch_loss += loss.item()
# Print the epoch loss
print(f'Epoch {epoch+1}: Loss = {epoch_loss}')

We can see the shape does change after the permute.

If you have any ideas on what could be going wrong I would love to hear them. I will keep digging until then. Thanks again.

I also get these warnings everytime I run. Not sure if they are related to my error but I will dig on these as well. I don’t think they are related to PyTorch but rather TensorFlow so I don’t expect a solution here. I DO have a 2080ti on my machine.

W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cudart64_110.dll'; dlerror: cudart64_110.dll not found
I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.

Your permuted logits in the shape [batch_size=32, nb_classes=768, seq_len=343] contain values for 768 classes, which means the target should contain values in [0, 767]. Every other index will create an out-of-bounds error.