Cross Entropy Loss: Target size and Output size mismatch

bibekx · October 12, 2020, 4:15am

I have problem using Categorical Cross Entropy loss
Target data is imported from a numpy array containing label indices for 3 classes (0,1,2)

Dataset definition

class Tr_dataset(Dataset):
    def __init__(self, windowed_input, classification_target):
        self.windowed_input = windowed_input
        self.classification_target = classification_target
        
    def __len__(self):
        return len(self.windowed_input)
    
    def __getitem__(self, index):      
        x_input = self.windowed_input[index]
        x_target = self.classification_target[index]
        
        x_input_tensor = torch.Tensor(x_input)
        x_input_tensor= x_input_tensor.view(SEQUENCE_LENGTH, INPUT_SIZE)
        
        # Ground truth 
        x_target_tensor = torch.LongTensor(x_target)
        return x_input_tensor, x_target_tensor

Model

class classification_RNN(nn.Module):
    def __init__(self, input_size, hidden_size, num_layers):
        super(classification_RNN, self).__init__()
        self.input_size = input_size
        self.hidden_size = hidden_size
        self.num_layers = num_layers
        
        self.rnn = nn.RNN(input_size = input_size, hidden_size = hidden_size,num_layers= num_layers, 
                          batch_first= True)
        
        self.out = nn.Linear(hidden_size, 3)

    def forward(self, x):

        # The first hidden layer is automatically initialized to zeros if not passed
        rnn_out, hidden = self.rnn(x)
        
        class_label = self.out(rnn_out)
        
        return class_label

Model relevant information
model = classification_RNN(input_size=INPUT_SIZE, hidden_size=HIDDEN_SIZE, num_layers=NUM_LAYERS)

Where:

# Model params
BATCH_SIZE = 12

# The sequence length of the windowed data input
SEQUENCE_LENGTH = 12

# This is actually the number of dimensions/ features in the input
INPUT_SIZE = 1 

# The number of features in the last hidden state (Which should definitely be one)
# This is also equal to the number of outpu time steps to predict?
HIDDEN_SIZE = 1

#The number of RNN layers to stack
NUM_LAYERS = 3

Since it is already getting too long: Partial Training loop:

for bi, (x_input, x_target) in enumerate(train_loader):
        model.train()
  
        x_input_batch, x_target_batch = x_input.to(device),  x_target.to(device)
    
        optimizer.zero_grad() 
    
        output_batch = model(x_input_batch)
        
        loss = criterion(output_batch, x_target_batch)

The 2 inputs for the criterion seem to have a size mismatch

I read in many places, I seem to be doing everything fine. But,
I get the error: ValueError: Expected target size (12, 3), got torch.Size([12, 1])

KarthikR · October 12, 2020, 6:52am

Can you print the shapes of output_batch and x_target_batch, before they are passed to the loss function?

bibekx · October 12, 2020, 12:13pm

Output Batch has a size:
torch.Size([12, 12, 3])

Target Batch has a size
torch.Size([12, 1])

I change a line in def forward: to

class_label = self.out(rnn_out.contiguous().view(-1, self.hidden_size))

to meet input expectation of linear

now,
Output Batch has a size:
torch.Size([144, 3])

I want it to be [12, 3]

KarthikR · October 12, 2020, 1:16pm

Let us assume you did not reshape and the original error was about output shape:

Output Batch has a size:
torch.Size([12, 12, 3])

Target Batch has a size
torch.Size([12, 1])

import torch
import torch.nn as nn
ce = nn.CrossEntropyLoss()
a = torch.randn((12,12,3))
b = torch.randn((12,1))
#Expected target size (12, 3), got torch.Size([12, 1])
#print('Input Shape: ', a.shape)
#print('Target Shape: ', b.shape)
#print('Loss: ', ce(a,b))

#This works as I reshape the input batch.
a = torch.randn((12,12,1))
b = torch.randint(0,3,(12,1))
print('Input Shape: ', a.shape)
print('Target Shape: ', b.shape)
print('Loss: ', ce(a,b))

In your original code, can you change “self.out = nn.Linear(hidden_size, 3)” to "self.out = nn.Linear(hidden_size, 1), print the shapes and try?

bibekx · October 12, 2020, 1:41pm

But, would I not want the logits as one of the inputs to Cross Entropy loss?

Also, I am not trying to do binary classification, I have to predict among 3 classes

I think the real problem I am facing is interfacing the output of RNN to a linear layer

Do you think this is correct?

self.rnn = nn.RNN(input_size = input_size, hidden_size = hidden_size,num_layers= num_layers, 
                          batch_first= True)
        
self.out = nn.Linear(hidden_size, 3)

pchandrasekaran · October 12, 2020, 2:41pm

Although, I’m not too familiar with the workings of RNNs, your implementation looks correct.

CrossEntropyLoss expects a input of dim = (N, C) and a target of dim = (N,). Additional dimensions are used for “K-dimensional loss” as stated in the docs. Since your output batch is of dim (12, 12, 3), the target expected shape is (12, 3), but your targets are (12, 1), which explains your error. You need to perform 2 reshapes:

The first one is

output_batch = output_batch[:, -1, :]

It works, but I have no idea why this specific “reshape”. Here’s a link to an RNN Implementation for MNIST where I looked it up.

The second one is

x_target_batch = x_target_batch.view(-1)

This one is to satisfy the target shape required for the loss function.

Caruso · October 12, 2020, 3:22pm

The RNN Module returns 2 output tensors, the outputs after each iteration and the last hidden state. We only use first, which is of shape [Batch, Seq, Hidden] with batch_first=True and num_directions=1. bibekx most likely only wants the output of the last iteration, so we slice it with [:, -1, :]. Best use of this slicing would be in the forward call in classification_RNN, right before we feed it into the linear layer.

pchandrasekaran · October 12, 2020, 3:59pm

That makes sense now. Thank You for the explanation!

bibekx · October 12, 2020, 6:07pm

Thank you guys for the help,
I did implement it as recommended by @Caruso and @pchandrasekaran

it seems to have initiated the training but… Its not training as expected. as the training accuracy increases, so does the loss. doesn’t make sense.

The link to code: (Please note some class names might be different)

Link to my GitHub

pchandrasekaran · October 12, 2020, 6:32pm

You’ve used the same variable in the format statement for the (acc, loss).

bibekx · October 12, 2020, 10:07pm

Hi @pchandrasekaran, Thanks! I fixed it. But now, the accuracy is not improving.

pchandrasekaran · October 13, 2020, 7:41am

@bibekx That’s a bit weird. I can’t tell how a variable change may have caused that.

What I would suggest is:

Check the model weights in the test_model function to ensure that an updated model is in fact passed.
Check that the dataloader in test_model is iterating through correctly.
Play around with the optimizer params (and different optimizers) and epochs. The loss flatlines at only the 4th epoch.

Personally, I don’t think 1 or 2 is an issue. It has to be the optimizer. Also, before any retraining, run cells 6, 10 and 11 just to ensure all the weights get reinitialized.