Loss functions don't work with image classification labels

I’ve started to work with pytorch a few weeks ago.(No prior knowledge of ML)

I want to build an image classifier that detects wheather an image is a cat or a dog.

I have labels in the form of:

tensor([[0., 1.],
        [0., 1.],
        [1., 0.],
        ...,
        [0., 1.],
        [0., 1.],
        [0., 1.]])

This is my network class:

class Net(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(50*50, 64)
        self.fc2 = nn.Linear(64, 64)
        self.fc3 = nn.Linear(64, 64)
        self.fc4 = nn.Linear(64, 2)
        
    def forward(self, x):

        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = F.relu(self.fc3(x))
        x = self.fc4(x)
        
        return F.log_softmax(x, dim=1)  
        

net = Net()

This is my training loop using optim.Adam and criterion = nn.CrossEntropyLoss():

EPOCHS=6

for epoch in range(EPOCHS):
    for key in range(24946):
        X = data[key]
        X = torch.unsqueeze (torch.flatten (X), dim = 0)
        y =out[key]
        net.zero_grad()
        output = net(X)
        loss = criterion(output, y)
        loss.backward()
        optimizer.step()
    print(loss)

The shape of the labels is torch.Size([2])
The shape of the inputs to the network is torch.Size([1, 2500])

However when I try to train it give sme the following error:
ValueError: Expected input batch_size (1) to match target batch_size (2).

The desired labels are a tensors of size 2 (eg : tensor([0., 1.])) and the output of my netword is also of size 2. What’s the reason for this? Any help is appreciated.

Could you please provide the shapes of:

  • The actual network input (i.e. I think this will be X[key].shape) (generally speaking its a good idea to follow the PyTorch convention of [Batch_size,Channels,Height,Width] so [1,1,50,50] here for your example I think)

  • The target label shape (an actual printout (so Y[key].shape)

  • The network output shape (output.shape).

Might help if we can see explicit print function outputs that’s all :slight_smile:

The shape of the labels is torch.Size([2])
The shape of the inputs to the network is torch.Size([1, 2500])

Your labels shape in inner training loop seems to be worng. It should be (1,2) not (2). (2) means you have output for two samples. (1,2) means, you have output for 1 sample having 2 classes (essentially a onehot vector).

Generally the training loop should contain 2 nested loops. The outer one is for epochs and the inner one should be for batch.
I am struggling to understand your inner loop though.
It seems that your data set is in shape of [n_samples, img_height, img_width] as the images are gray images.
At first you need to reshape it to [n_samples, 1, img_height, img_width].
You may want to re-write you inner loop like

for batchIdx in range(len(trainX)):
    x = trainX[batchIdx] #as your batch size is 1
    # Flatten it as you are using linear layer. You could use conv layers though
    y  = trainY[batchIdx]; # 
    # Now rest of your code.

Hope this helps

1 Like