Target size (torch.Size([10])) must be the same as input size (torch.Size([2]))

Hey guys, I am an ABSOLUTE newbie over here, and I am trying to use some frankensteined pytorch code to create a neural network for a final project. I’ve been getting mistakes and fixing them, but I really can’t figure out this one and all the explanations involve really complicated mathematical things. Any help is extremely appreciated.

import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable
import torch.utils.data as data
import torchvision
from torchvision import transforms

EPOCHS = 2
BATCH_SIZE = 10
LEARNING_RATE = 0.003
TRAIN_DATA_PATH = "./images/train/"
TEST_DATA_PATH = "./images/test/"
TRANSFORM_IMG = transforms.Compose([
    transforms.Grayscale(num_output_channels=1),
    transforms.Resize(256),
    transforms.CenterCrop(256),
    transforms.ToTensor(),
    transforms.Normalize((0.5,), (0.5,))
    ])

train_data = torchvision.datasets.ImageFolder(root=TRAIN_DATA_PATH, transform=TRANSFORM_IMG)
train_data_loader = data.DataLoader(train_data, batch_size=BATCH_SIZE, shuffle=True,  num_workers=4)
test_data = torchvision.datasets.ImageFolder(root=TEST_DATA_PATH, transform=TRANSFORM_IMG)
test_data_loader  = data.DataLoader(test_data, batch_size=BATCH_SIZE, shuffle=True, num_workers=4) 

class CNN(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(1, 32, 5)
        self.conv2 = nn.Conv2d(32, 64, 5)
        self.conv3 = nn.Conv2d(64, 128, 5)

        x = torch.randn(50,50).view(-1,1,50,50)
        self._to_linear = None
        self.convs(x)

        self.fc1 = nn.Linear(self._to_linear, 512)
        self.fc2 = nn.Linear(512, 2)

    def convs(self, x):
        x = F.max_pool2d(F.relu(self.conv1(x)), (2,2))
        x = F.max_pool2d(F.relu(self.conv2(x)), (2,2))
        x = F.max_pool2d(F.relu(self.conv3(x)), (2,2))

        if self._to_linear is None:
            self._to_linear = x[0].shape[0]*x[0].shape[1]*x[0].shape[2]
        return x

    def forward(self, x):
        x = self.convs(x)
        x = x.view(-1, self._to_linear)
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        return F.softmax(x, dim=1)


if __name__ == '__main__':

    print("Number of train samples: ", len(train_data))
    print("Number of test samples: ", len(test_data))
    print("Detected Classes are: ", train_data.class_to_idx) # classes are detected by folder structure

    model = CNN()    
    optimizer = torch.optim.Adam(model.parameters(), lr=LEARNING_RATE)
    loss_func = nn.BCEWithLogitsLoss()    

    # Training and Testing
    for epoch in range(EPOCHS):        
        for step, (x, y) in enumerate(train_data_loader):
            b_x = Variable(x)   # batch x (image)
            b_y = Variable(y)   # batch y (target)
            output = model(b_x)[0]          
            loss = loss_func(output, b_y)   
            optimizer.zero_grad()           
            loss.backward()                 
            optimizer.step()

            if step % 50 == 0:
                test_x = Variable(test_data_loader)
                test_output, last_layer = model(test_x)
                pred_y = torch.max(test_output, 1)[1].data.squeeze()
                accuracy = sum(pred_y == test_y) / float(test_y.size(0))
                print('Epoch: ', epoch, '| train loss: %.4f' % loss.data[0], '| test accuracy: %.2f' % accuracy)
1 Like

I think this error is coming from your loss function - loss = loss_func(output, b_y) .

The size of output and b_y is different. BCEWithLogitsLoss() expects the input and output size to be the same. Check out the documentation here. Here’s a blog post which goes through different loss functions in PyTorch.

The final fc2 layers gives out a tensor of size 2 while but your actual test input is of size 10.

What are you trying to solve with nnet? Binary classification?

Yeah I am trying to do binary classification. What would be the best way to go around solving this issue? Should I change my loss function? Or should my test input be of size 2 for the two labels that I am trying to identify(how would I go about doing that?)

Thank you for answering btw

Your actual test labels should be of size 2 because it’s binary classification. Why is it of size 10?

To get your model running, that’s the only thing you need figure out. See why your test labels are of size 10 for a binary classification application.

What is the output of => train_data.classes and test_data.classes?
Alternatively, go into the train/test folders to see how many classes (folders) exist.

My guess is, train folder has 2 subfolders (for 2 classes) while your test folder has 10 subfolders (for 10 classes).

That’s weird, because my test folder also have two subfolders(both are named the same things), and when I print train_data.classes and test_data.classes I get the same two classes for both.

I did figure out where the code gets it number 10 from, because when I changed my batch size to 100, i got “Target size (torch.Size([100])) must be the same as input size (torch.Size([2]))”

Is there a reason why the batch size is influencing the target size?

Thanks again

Your use case mixes some workflows for a binary classification.
You could either:

  • use two output units + nn.CrossEntropyLoss and a target of shape [batch_size] containing the class indices
  • or a single output unit + nn.BCEWithLogitsLoss and a target of shape [batch_size, 1]

Neither use case uses a softmax activation at the end, as both criteria will use an activation function internally, so you should remove the softmax.

That being said, the shape mismatch is probably created in:

x = x.view(-1, self._to_linear)

Could you use x = x.view(x.size(0), -1) to keep the batch dimension constant.
This could potentially yield a shape mismatch in the feature dimension, which you would need to fix by changing the in_features in the conflicting linear layer.

1 Like

I’m not 100% sure I get how to solve that. I removed the softmax and changed the x = x.view(-1, self._to_linear) to x = x.view(x.size(0), -1) as you recommended, and I do get a shape mismatch, but I don’t understand which layer is having the issue, and I don’t get why the batch_size of 10 once again finds its way into the error.

Here is the error message by the way:

RuntimeError: size mismatch, m1: [10 x 100352], m2: [512 x 512] at C:\w\1\s\tmp_conda_3.7_104508\conda\conda-bld\pytorch_1572950778684\work\aten\src\TH/generic/THTensorMath.cpp:197

Is the problem cause in the forward or in the layer itself?

Here is a link to my updated code on github after all the small changes I’ve done with you guys’ help if the format is getting difficult to follow:

Code on Github

The number if in_features in your first linear layer has to match the flattened features of its incoming activation.
In that case it seems the feature calculation is wrong and you should set in_features = 100352 instead of 512.

Ooooooooh you’re right I missed that, thanks a lot.

I fixed that and am getting some more mistakes(it truly seems like they never end).

ValueError: Target size (torch.Size([10])) must be the same as input size (torch.Size([10, 2]))

With a batch_size of 10, my output looks like this:

tensor([[0.0476, 0.0160],
        [0.0501, 0.0116],
        [0.0468, 0.0165],
        [0.0522, 0.0088],
        [0.0579, 0.0139],
        [0.0544, 0.0130],
        [0.0488, 0.0174],
        [0.0554, 0.0142],
        [0.0480, 0.0270],
        [0.0497, 0.0193]], grad_fn=<AddmmBackward>)

while my b_y looks like this:

tensor([0, 1, 0, 1, 1, 1, 1, 1, 0, 0])

I’m guessing that my output is the issue, but I’m not sure why it’s outputting two values per image or how to turn that into one value per image. Is there a function I’m missing? Or am I just not using the right loss function?

You’re getting that error in the line where you calculate the loss.

You’re using the BCEWithLogitsLoss(). The input and output have to be the same size and have the dtype float.

y_pred = (batch_size, *) , Float

y_train = (batch_size, *) , Float

What you need to do it, return only a single output value from your network. To do that, you need to change this line self.fc2 = nn.Linear(512, 2) to self.fc2 = nn.Linear(512, 1).

Once you do this, your network will return an output of size [10, 1]. Now, you will still get an error because your target is of the size [10]. So, run .squeeze() on the predicted tensor which will remove the dimension ‘1’ and convert it to [10] from [10,1] ie. in your train loop, do the following before you calculate loss -

    for epoch in range(EPOCHS):        
        for step, (x, y) in enumerate(train_data_loader):
            b_x = Variable(x)   # batch x (image)
            b_y = Variable(y)   # batch y (target)
            output = model(b_x)[0]          
            loss = loss_func(output.squeeze(), b_y)
            ...

Let me know if this works. :slight_smile:

Use what @theairbend3r suggested or alternatively switch to nn.CrossEntropyLoss and treat your binary classification as multi-class classification. :wink:

Alright so I changed my last layer to a single output and added .squeeze() to remove the extra dimension, and that solved the dimension issue, but now I’m getting an another error:

    return torch.binary_cross_entropy_with_logits(input, target, weight, pos_weight, reduction_enum)

RuntimeError: result type Float can't be cast to the desired output type Long

Do you guys know what caused this to happen and how to solve it?

Oh nevermind I managed to solve it by adding the line

b_y = b_y.type_as(output)

That made them the same type, so no problem :slight_smile:

Thank you guys so much for all the help, you have been real lifesavers @theairbend3r and @ptrblck <3

1 Like

Hello. I tried this solution and it worked for me Target size (torch.Size([10])) must be the same as input size (torch.Size([2])) - #11 by ptrblck. However, I have read that CrossEntropyLoss is recommended for multi-class problems and BCELoss()/BCELogitsWithLoss() is recommended for binary class problems. So are there any conceptual downsides of this hack of using CrossEntropyLoss() for a binary classification problem?

Both approaches would work but @KFrank has a great detailed explanation here.

1 Like