The size of tensor a (10) must match the size of tensor b (100) at non-singleton dimension 1

Hi there, im following tutorial of simple neural network with 2 hidden layers and I wanted to add MSE and CE charts at the end of file, however i have no idea what should i do, any ideas?

import torch
import torchvision
import torch.nn as nn
import torch.nn.functional as f
from torch.utils.data import DataLoader
from torchvision import datasets, transforms
import sklearn 
import numpy as np
import pandas as pd
import sklearn.metrics
import matplotlib.pyplot as plt

torch.manual_seed(101)

Transform = transforms.ToTensor()

train = torchvision.datasets.MNIST('', train=True, download=True, transform=Transform)
test = torchvision.datasets.MNIST('', train=False, download=True, transform=Transform)

trainset = DataLoader(train, batch_size=100, shuffle=True)
testset = DataLoader(test, batch_size=500, shuffle=False)

class Model(nn.Module):
    def __init__(self, input_size=784, output_size=10, layers=[120,84]):
        super().__init__()
        self.d1 = nn.Linear(input_size, layers[0])
        self.d2 = nn.Linear(layers[0], layers[1])
        self.d3 = nn.Linear(layers[1], output_size)
    
    def forward(self, X):
        X = f.relu(self.d1(X))
        X = f.relu(self.d2(X))
        X = self.d3(X)
        return f.log_softmax(X, dim=1)

model = Model()

ce = nn.CrossEntropyLoss()
mse = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

train_losses = []
train_mse_losses = []
test_losses = []
test_mse_losses = []

train_correct = []
test_correct = []

for i in range(10):
    trn_cor = 0
    tst_cor = 0
    for b, (X_train, y_train) in enumerate(trainset):
        b+=1
        y_pred = model(X_train.view(100, -1))
        loss = ce(y_pred, y_train)
        loss_mse = mse(y_pred, y_train)

        predicted = torch.max(y_pred.data, 1)[1]
        batch_cor = (predicted==y_train).sum()
        trn_cor += batch_cor

        optimizer.zero_grad()
        loss.backward()
        loss_mse.backward()
        optimizer.step()

        if b%600 == 0:
            print(f'epoch: {i:2} train loss: {loss.item():10.6f}')

    train_losses.append(loss)
    train_mse_losses.append(loss_mse)
    train_correct.append(trn_cor)

    with torch.no_grad():
        for b, (X_test, y_test) in enumerate(testset):
            y_val = model(X_test.view(500, -1))

            predicted = torch.max(y_val.data, 1)[1]
            tst_cor += (predicted == y_test).sum()
    
    loss = ce(y_val, y_test)
    loss_mse = mse(y_val, y_test)
    test_losses.append(loss)
    test_mse_losses.append(loss_mse)
    test_correct.append(tst_cor)

print(f'test acc: {test_correct[-1].item()*100/10000:.3f}%')

plt.subplot(3,1,1)
plt.plot(train_losses, label='training loss')
plt.plot(test_losses, label='validation loss')
plt.title('Loss at the end of each epoch')

plt.subplot(3,1,3)
plt.plot([t/600 for t in train_correct], label='training acc')
plt.plot([t/100 for t in train_correct], label='validation acc')
plt.title('Accuracy at the end of each epoch')

plt.legend()

Everything was working until I’ve added

mse = nn.MSELoss()

Hi Igor!

Your problem is that CrossEntropyLoss and MSELoss work rather
differently (both conceptually and mechanically).

CrossEntropyLoss expects its input (your y_pred) to be a set of
logits (see below) for the classes your model outputs, typically of
shape [nBatch, nClass] (in your case [100, 10]), and its target
(your y_train) to be integer class labels, typically of shape [nBatch]
(no nClass dimension) with values that range from 0 to nClass - 1.

In contrast, MSELoss expects its input and target to have the same
shape as one another and uses mean-squared-error to measure, on
average, how close the individual elements of input and target are
to one another. You are passing an input of shape [100, 10] and a
target of shape [100] to MSELoss, hence the error.

But furthermore, conceptually, a network that is trained to perform
classification outputs logits (or other probability-like values) for each
of the classes. In the typical use case, these values are not naturally
comparable to other numbers, which is to say, using them with
MSELoss doesn’t really make sense.

You need to think about the meaning of your y_train values (and
similarly y_pred) and whether it even makes sense to use them
with MSELoss.

As an aside, it’s a little more efficient to go backward() through your
network just once (per optimizer step), so you could rewrite this as

        loss_total = loss + loss_mse
        loss_total.backward()
        # or just  (loss + loss_mse).backward()

Note that CrossEntropyLoss has log_softmax() built into it, so you
don’t want this. CrossEntropyLoss takes raw-score logits that run
from -inf to inf and are typically just the output of your final Linear
layer.

If you want that final log_softmax() you should use NLLLoss instead
pf CrossEntropyLoss. (CrossEntropyLoss is just log_softmax()
and NLLLoss put together for you.)

Best.

K. Frank