CrossEntropyLoss target size

kdeary · December 5, 2022, 6:30pm

Beginner here. I am having trouble with target size and how to reshape for CrossEntropyLoss.

I received this error: **RuntimeError** : Expected target size [100, 4], got [100].

I’ve seen that many have had this same issue, but still do not understand the other posts. How do I reshape?

# define model
class Model(nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        self.layer1 = nn.Linear(28, 64)
        self.layer2 = nn.ReLU()
        self.layer3 = nn.Linear(64, 32)
        self.layer4 = nn.ReLU()
        self.layer5 = nn.Linear(32, 4) # 4 classes
        
    def forward(self, x):
        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.layer4(x)
        x = self.layer5(x)        
        return x

model = Model()

# loss & optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.1)

#train model
num_epochs = 5

for epoch in range(num_epochs):
    for i, data in enumerate(train_loader):
        inputs, labels = data

        # zero the parameter gradients
        optimizer.zero_grad()

        # forward propagation
        outputs = model(inputs)
        
        loss = criterion(outputs, labels)

        # backward propagation & optimize
        loss.backward()
        optimizer.step()

# shapes after layers
Shape after layer: torch.Size([100, 28, 64])
Shape after layer: torch.Size([100, 28, 64])
Shape after layer: torch.Size([100, 28, 32])
Shape after layer: torch.Size([100, 28, 32])
Shape after layer: torch.Size([100, 28, 4])

ptrblck · December 5, 2022, 10:48pm

Based on the posted shapes your output has the shape [batch_size=100, nb_classes=28, seq_len=4], which requires the target to have the shape [batch_size=100, seq_len=4].
However, based on your model architecture I assume you want to use 4 classes in the prediction.
Since your output is 3-dimensional: are you using a sequence of samples or is the input reshaped in a wrong way?

J_Johnson · December 6, 2022, 1:29am

You’re running images through Linear layers.

Linear layers perform poorly on images and Conv2d layers are more sufficient in order to distill 2d patterns into simpler data. For example, see here: Training a Classifier — PyTorch Tutorials 1.13.0+cu117 documentation
You’re ignoring one of the dims of your images. If you must put it through linear layers, try flattening your tensor before putting it into the model. This will require changing your layer sizes, in particular, the first layer.

inputs = inputs.view(-1, 28*28)
self.layer1 = nn.Linear(28*28, 64)

kdeary · December 7, 2022, 12:17am

The data is based on 28x28 greyscale images, but not the images itself.

From this code:

example = enumerate(test_loader)
batch_idx, (example_data, example_targets) = next(example)
print(example_data.shape)

torch.Size([1000, 28, 28])

ptrblck · December 7, 2022, 12:35am

@J_Johnson’s suggestion of flattening the input sounds reasonable so did you try to fix the issue with it?

kdeary · December 7, 2022, 3:44am

It did work! I am able to train the network.

My accuracy is currently at 0.32. Do you have any tips to increase the accuracy?

J_Johnson · December 7, 2022, 4:44am

Here are the top model architecture scores for MNIST Fashion dataset classification with links to papers.

CDog · January 1, 2023, 6:41pm

Hello there. I have seen a lot of others with this issue. Any advice would be appreciated. I am trying to train a GPT-2 model to take in a tokenized/padded input and predict the output. My batch size is 32. My max length is 343. I believe that the 768 comes from the model. I cannot get the loss function to work properly though. The training loop keeps throwing me errors like this:

RuntimeError: Expected target size [32, 768], got [32, 343]

This is my training loop so far.

# Define the model architecture
model = transformers.GPT2Model.from_pretrained('gpt2').to(device)

# Define the loss function
loss_function = nn.CrossEntropyLoss(ignore_index=0, reduction='mean')

# Define the optimizer
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

# Set the model to training mode
model.train()
print(f"input_tensors.shape before the loop: {input_tensors.shape}")
print(f"output_tensors.shape before the loop: {output_tensors.shape}")

# Loop over the number of epochs
for epoch in range(num_epochs):
    # Initialize the epoch loss
    epoch_loss = 0
    
    # Loop over the data in the dataloader
    for input_tensors, output_tensors in dataloader:
        # Send the input and target tensors to the device
        input_tensors = input_tensors.to(device)
        output_tensors = output_tensors.to(device)
        # Zero gradients
        optimizer.zero_grad()
        
        # Begin Forward pass
        logits = model(input_tensors)[0]
        
        print(f"logits.shape: {logits.shape}")
        print(f"input_tensors.shape: {input_tensors.shape}")
        print(f"output_tensors.shape: {output_tensors.shape}")
        
        # Compute the loss
        loss = loss_function(logits, output_tensors)

        # Backward pass
        loss.backward()

        # Update the model parameters
        optimizer.step()

        # Add the loss to the epoch loss
        epoch_loss += loss.item()
        # Print the epoch loss
    print(f'Epoch {epoch+1}: Loss = {epoch_loss}')

The corresponding sizes:

input_tensors.shape before the loop: torch.Size([2625, 343])
output_tensors.shape before the loop: torch.Size([2625, 343])
logits.shape: torch.Size([32, 343, 768])
input_tensors.shape: torch.Size([32, 343])
output_tensors.shape: torch.Size([32, 343])

J_Johnson · January 2, 2023, 1:59am

If you’re having a specific problem with a module, best to start a new thread.

However, GPT2Model is a Huggingface model and you should refer to their documentation to find the correct solution.

You can find the documentation for GPT2Model here:

And here is a training tutorial using the GPT2Model:

Additionally, they have a discussion area here for asking questions related to their Transformer models:

ptrblck · January 2, 2023, 4:32am

You are running into the same issue as described in my previous post. nn.CrossEntropyLoss expects logits in the shape [batch_size, nb_classes, *] and targets in the shape [batch_size, *] containing class indices in the range [0, nb_classes-1] where * denotes additional dimensions.
Your current logits in the shape [32, 343, 768] correspond to batch_size=32, nb_classes=343, seq_len=768 which doesn’t match the expected target shape.
Assuming you are dealing with 768 classes .permute the model output to have the shape [32, 768, 343] via logits = logits.permute(0, 2, 1) and it should work.

CDog · January 2, 2023, 3:59pm

Hi. Thanks for the response! I think that solution brings one step closer to training the model. Here is my new error. It bounces from 4868 to 1330 to 1640 etc… randomly. So I posted 2 examples.

return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index, label_smoothing)
IndexError: Target 4868 is out of bounds.

IndexError: Target 1330 is out of bounds.

Updated code:

# Loop over the number of epochs
for epoch in range(num_epochs):
    # Initialize the epoch loss
    epoch_loss = 0
    
    # Loop over the data in the dataloader
    for input_tensors, output_tensors in dataloader:
        # Send the input and target tensors to the device
        input_tensors = input_tensors.to(device)
        output_tensors = output_tensors.type(torch.LongTensor)
        output_tensors = output_tensors.to(device)
        # Zero gradients
        optimizer.zero_grad()
        
        # Begin Forward pass
        logits = model(input_tensors)[0]
        
        print(f"logits.shape: {logits.shape}")
        print(f"input_tensors.shape: {input_tensors.shape}")
        print(f"output_tensors.shape: {output_tensors.shape}")
        logits = logits.permute(0, 2, 1)
        # Compute the loss
        print(f"logits.shape after permute: {logits.shape}")
        loss = loss_function(logits, output_tensors)


        # Backward pass
        loss.backward()

        # Update the model parameters
        optimizer.step()

        # Add the loss to the epoch loss
        epoch_loss += loss.item()
        # Print the epoch loss
    print(f'Epoch {epoch+1}: Loss = {epoch_loss}')

We can see the shape does change after the permute.

logits.shape: torch.Size([32, 343, 768])
input_tensors.shape: torch.Size([32, 343])
output_tensors.shape: torch.Size([32, 343])
logits.shape after permute: torch.Size([32, 768, 343])

If you have any ideas on what could be going wrong I would love to hear them. I will keep digging until then. Thanks again.

I also get these warnings everytime I run. Not sure if they are related to my error but I will dig on these as well. I don’t think they are related to PyTorch but rather TensorFlow so I don’t expect a solution here. I DO have a 2080ti on my machine.

 W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cudart64_110.dll'; dlerror: cudart64_110.dll not found
I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.

ptrblck · January 2, 2023, 9:13pm

Your permuted logits in the shape [batch_size=32, nb_classes=768, seq_len=343] contain values for 768 classes, which means the target should contain values in [0, 767]. Every other index will create an out-of-bounds error.