"Target and input must have the same number of elements" ERROR for training ConvNet

johnvalen1 · May 5, 2020, 11:08pm

Hi guys.

I’m training a convNet on gray-scale 224*224 images with 2 classes.

This is my convNet:

class Frequentist_CNN(ModuleWrapper):
    def __init__(self, outputs, inputs):
        super(Frequentist_CNN, self).__init__()

        self.num_classes = outputs
        self.num_channels = inputs

        self.layer1 = nn.Sequential(
            nn.Conv2d(inputs, 6, kernel_size=5, stride=1, padding=2),
            nn.Softplus(),
            nn.MaxPool2d(kernel_size=2, stride=2))
        self.layer2 = nn.Sequential(
            nn.Conv2d(6, 16, kernel_size=5, stride=1, padding=0),
            nn.Softplus(),
            nn.MaxPool2d(kernel_size=2, stride=2))
        self.layer3 = nn.Sequential(
            nn.Conv2d(16, 16, kernel_size=5, stride=1, padding=1),
            nn.Softplus(),
            nn.MaxPool2d(kernel_size=3, stride=2))
        
        self.flatten = FlattenLayer(25)
        self.fc1 = nn.Linear(25, 1000)
        self.act1 = nn.Softplus()
        self.fc2 = nn.Linear(1000, 500)
        self.act2 = nn.Softplus()
        self.fc3 = nn.Linear(500, outputs)

    def forward(self, x):
        out = self.layer1(x)
        out = self.layer2(out)
        out = self.layer3(out)
        # out = out.view(1, -1)
        out = self.fc1(out)
        #out = nn.Softplus(self.fc2(out))
        #out = nn.Softplus(self.fc3(out))
        out = self.fc2(out)
        out = self.fc3(out)
        return out

And I’m running the training loop:

# Train the model

freq_net = Frequentist_CNN(num_classes, num_channels)#.to(device)

# Loss and optimizer
criterion = nn.BCELoss()
optimizer = torch.optim.Adam(freq_net.parameters(), lr=0.01)

total_step = len(train_loader)
loss_list = []
acc_list = []
for epoch in range(n_epochs):
    for i, (images, labels) in enumerate(train_loader):
        # Run the forward pass
        print("Images have shape: {}".format(images.shape))
        # images, labels = images.to(device), labels.to(device)
        #labels = labels.unsqueeze(1)
        outputs = freq_net(images)
        #outputs = outputs.argmax(dim=1, keepdim=True)
        print("Outputs have shape: {}".format(outputs.shape))
        print("Labels have shape: {}".format(labels.shape))

        loss = criterion(outputs, labels)
        loss_list.append(loss.item())

        # Backprop and perform Adam optimisation
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        # Track the accuracy
        total = labels.size(0)
        _, predicted = torch.max(outputs.data, 1)
        correct = (predicted == labels).sum().item()
        acc_list.append(correct / total)

        if (i + 1) % 100 == 0:
            print('Epoch [{}/{}], Step [{}/{}], Loss: {:.4f}, Accuracy: {:.2f}%'
                  .format(epoch + 1, num_epochs, i + 1, total_step, loss.item(),
                          (correct / total) * 100))

Giving me the output AND error:
(Notice the shapes)

Images have shape: torch.Size([32, 1, 224, 224])
Outputs have shape: torch.Size([32, 16, 25, 2])
Labels have shape: torch.Size([32])


/usr/local/lib/python3.6/dist-packages/torch/nn/modules/loss.py:516: UserWarning: Using a target size (torch.Size([32])) that is different to the input size (torch.Size([32, 16, 25, 2])) is deprecated. Please ensure they have the same size.
  return F.binary_cross_entropy(input, target, weight=self.weight, reduction=self.reduction)

...

ValueError: Target and input must have the same number of elements. target nelement (32) != input nelement (25600)

As you can see, I’ve tried unsqueezing labels, training on the GPU… I’ve also tried CrossEntropyLoss(). That didn’t work either.

Does anybody have any idea why this may be happening?

Thanks so much!
John

ptrblck · May 6, 2020, 6:52am

It seems you are not flattening the activations in your model before feeding them to the linear layers, so I would recommend to uncomment the view operation and change it to out = out.view(out.size(0), -1).

Also, you are not using activation functions between the linear layers.

nn.BCELoss expects a probability input, so you should either apply sigmoid on the output or use nn.BCEWithLogitsLoss.

However, based on the target shape it seems you are dealing with a multi-class classification with output classes.
If that’s the case, you should use nn.CrossEntropyLoss instead.
The model output should have the shape [batch_size, nb_classes] and the target [batch_size] containing the class indices in the range [0, nb_classes-1].

johnvalen1 · May 6, 2020, 8:16pm

Hi! Thanks for all that! What ended up working was nn.CrossEntropyLoss() along with out = out.view(out.size(0), -1) instead of my original out = out.view(1, -1).

Now… I have a new problem, although it’s slightly unrelated.
Basically, I suspect that my model isn’t properly saving and loading.

I run the training loop where I define the directory to save to:

# Train the model

freq_net = Frequentist_CNN(num_classes, num_channels).to(device)

# Loss and optimizer
# criterion = nn.BCEWithLogitsLoss()
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(freq_net.parameters(), lr=0.01)

# Where to save the model:
freq_ckpt_dir = 'saved_model/COVID/model_custom/'
# File name
freq_ckpt_name = 'freq_COVID_CNN.pt'

path_freq_CNN = freq_ckpt_dir + freq_ckpt_name

# If it doesn't exist, make it.
if not os.path.exists(freq_ckpt_dir):
    os.makedirs(freq_ckpt_dir, exist_ok=True)     


total_step = len(train_loader)
loss_list = []
acc_list = []
# Train
print("Training.....")
for epoch in range(n_epochs):
    for i, (images, labels) in enumerate(train_loader):
        print("Batch: \t{}".format(i+1))
        # Run the forward pass
        #print("Images have shape: {}".format(images.shape))
        images, labels = images.to(device), labels.to(device)
        #labels = labels.unsqueeze(1)
        outputs = freq_net(images).to(device)
        #outputs = outputs.argmax(dim=1, keepdim=True)
        #print("Outputs have shape: {}".format(outputs.shape))
        #print("Labels have shape: {}".format(labels.shape))

        loss = criterion(outputs, labels)
        loss_list.append(loss.item())

        # Backprop and perform Adam optimisation
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        # Track the accuracy
        total = labels.size(0)
        _, predicted = torch.max(outputs.data, 1)
        correct = (predicted == labels).sum().item()
        acc_list.append(correct / total)

        if (i + 1) % 4 == 0:
            print('Epoch [{}/{}], Step [{}/{}], Loss: {:.4f}, Accuracy: {:.2f}%'
                  .format(epoch + 1, n_epochs, i + 1, total_step, loss.item(),
                          (correct / total) * 100))
            

        # Save.
        torch.save(freq_net.state_dict(), path_freq_CNN)  
print("Done training.")

And it gets this performance after 10 epochs:

Epoch [10/10], Step [4/5], Loss: 0.8934, Accuracy: 78.12%

But now I go load it and test:

# Test the model

freq_net.load_state_dict(torch.load(path_freq_CNN))


freq_net.eval()
with torch.no_grad():
    correct = 0
    total = 0
    for images, labels in test_loader:
        images, labels = images.to(device), labels.to(device)
        outputs = freq_net(images).to(device)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

    print('Test Accuracy of the model on the 60 test images: {} %'.format((correct / total) * 100))

Only to see:

Test Accuracy of the model on the 60 test images: 50.0 %

My test-set is balanced, so it seems to be outputting just one class every time.
Is the problem that I’m not saving/loading correctly? It seems my test_loader is defined correctly, so that can’t be it…

Thanks so much!

ptrblck · May 7, 2020, 12:37am

I think your saving and loading is correct.
However, it seems you are comparing the training with the validation accuracy, so I would recommend to add a validation loop before saving and compare it to the restored model.

johnvalen1 · May 7, 2020, 1:14am

Hi, I’m not sure I understand…

In the first case, I am training and getting a training accuracy of 78.12%. I save the model.

I then load it, and test it on my test data. It seems to be getting 50.0% every time (balanced binary test set, so it seems to be outputting the same thing every time, perhaps).

I did not include a validation set because the dataset is so small… so I did training and testing only.

ptrblck · May 7, 2020, 2:20am

Sorry for the confusion.
I just think that your storing and loading might be alright.
You could add the testing loop in the first script, store the model, load it in the other script and retest.
If both test scores are ~50%, then the serialization is alright and your model might be overfitting.