# Poor Performance for the MNIST Digits problem (using MSELoos and SGD)

I want to consider MNIST digits as a regression problem (as we can do for the `house price prediction`).

I used `MSELoos` and `SGD` optimizer. The last layer of the CNN model is linear with one neuron. The structure of the model is given below

``````class CNN(nn.Module):
def __init__(self):
super(CNN, self).__init__()
self.conv1 = nn.Sequential(
nn.ReLU(),
nn.MaxPool2d(kernel_size=2),
)
self.conv2 = nn.Sequential(
nn.Conv2d(16, 32, 5, 1, 2),
nn.ReLU(),
nn.MaxPool2d(2),
)
self.out = nn.Linear(32 * 7 * 7, 1)
def forward(self, x):
x = self.conv1(x)
x = self.conv2(x)
x = x.view(x.size(0), -1)
output = self.out(x)
return output, x
``````

Train code snippets

``````optimizer = optim.SGD(params=model.parameters(), lr=LR)
criterion = nn.MSELoss()

model.train()
for epoch in range(NB_EPOCS):
for i, (images, labels) in enumerate(loaders['train']):
b_x = Variable(images)   # batch x
b_y = Variable(labels)   # batch y
output, _ = model(b_x)
loss = criterion(output, b_y.float())
loss.backward()
optimizer.step()
if (i+1) % 100 == 0:
print ('Epoch [{}/{}], Step [{}/{}], Loss: {:.4f}'
.format(epoch + 1, NB_EPOCS, i + 1, total_step, loss.item()))
``````

I am getting a warning

``````UserWarning: Using a target size (torch.Size([100])) that is different to the input size (torch.Size([100, 1])). This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size.
return F.mse_loss(input, target, reduction=self.reduction)
``````

The loss is very high

``````Epoch [1/10], Step [100/600], Loss: 7.7500
Epoch [1/10], Step [200/600], Loss: 8.3003
Epoch [1/10], Step [300/600], Loss: 8.7280
Epoch [1/10], Step [400/600], Loss: 8.4920
Epoch [1/10], Step [500/600], Loss: 8.5399
Epoch [1/10], Step [600/600], Loss: 8.8300
Epoch [2/10], Step [100/600], Loss: 10.8930
Epoch [2/10], Step [200/600], Loss: 10.0020
Epoch [2/10], Step [300/600], Loss: 7.9896
Epoch [2/10], Step [400/600], Loss: 7.2748
Epoch [2/10], Step [500/600], Loss: 9.4017
``````

However, at the time of testing, the model is predicting all images as 0.

``````Test Accuracy of the model on the 10000 test images: %.2f 0.16
Prediction Number: [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
Actual Number [3 8 1 8 3 0 3 9 0 9 3 3 3 1 2 9 7 4 9 1 4 7 4 7 4 3 7 3 1 4 1 7 4 1 6 2 5
0 0 6 8 8 3 2 5 1 6 3 9 3 8 1 4 1 7 7 5 8 3 2 0 4 3 5 9 3 9 4 7 6 0 7 2 3
9 2 6 7 3 5 6 8 2 3 7 2 6 5 6 3 6 4 5 0 0 7 0 7 6 6]
``````

I think the warning is one of the reasons for the poor performance. Could you tell me how can I resolve the warning and improve the performance?

Hi,
you need to squeeze your input tensor, so that input and target are of same size.
In your specific case, PyTorch would broadcast both tensors to tensors of size ([100, 100]).

@Unity05 could you tell me, how I have to do squeeze?

`loss = criterion(output.squeeze(-1), b_y.float())`
Btw, Variables are deprecated. Nowadays, normal tensors can have gradients as well.

1 Like

@Unity05 Thanks a lot. It works. Now the loss is not terrible! And the training phase is not bad,

Training phase prediction

``````[ 7.6779],
[ 4.2894],
[ 4.9621],
[ 3.0036],
[ 1.2166],
[ 4.3545],
[ 6.1130],
[ 8.3136],
[ 6.8330],
[ 2.3176],
[ 4.7833],
[-0.7525],
[ 3.9115],
[ 3.6507]]
``````

Training phase actual

``````[9, 6, 6, 2, 1, 2, 8, 9, 7, 3, 6, 0, 5, 5]
``````

But at the time of testing I am getting 0 not any numbers!!

``````Test Accuracy of the model on the 10000 test images: %.2f 0.05
Prediction Number: [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
Actual Number [9 0 5 4 9 4 9 9 6 4 8 6 3 3 7 8 8 9 6 6 7 2 8 1 6 6 7 9 2 0 1 8 9 4 3 2 6
6 0 1 5 2 7 2 8 2 0 5 7 4 6 2 4 2 1 2 8 2 2 9 6 8 1 7 3 4 1 4 2 3 1 6 4 8
7 0 2 5 5 1 4 4 5 1 1 3 7 1 9 0 7 4 2 0 7 0 2 8 5 3]
``````

Any idea, why this is happening? Or I am doing something wrong at the time of testing?

``````#Loading Model
model = my_model.get_model()

def test():
# Test the model
correct = 0
total = 0
test_output, last_layer = model(images)
pred_y = torch.max(test_output, 1)[1].data.squeeze()
accuracy = (pred_y == labels).sum().item() / float(labels.size(0))

print('Test Accuracy of the model on the 10000 test images: %.2f', accuracy)

test()
``````

This is the actual prediction from test images

``````        [ 7.0612],
[ 4.3061],
[ 6.4712],
[ 0.7163],
[ 3.1995],
[ 2.7388],
[ 3.8326],
[ 3.6017],
[ 7.4603],
[ 0.7398],
[ 8.2723],
``````

I think, I am doing something wrong in this line

``````pred_y = torch.max(test_output, 1)[1].data.squeeze()
``````

That happens because the second output of `torch.max()` contains the respective indices.
Furthermore, you set `dim=1`, but your outputs only have ine element for dim 1, so the argmax always is 0. I guess, you want to change your network to have 10 output nodes to regress the probabilities for each digit?

No, I need only one value and I used the bellow line and working fine.

``````pred_y = test_output.squeeze()
``````

Okay then.
May I ask why you donâ€™t train a classifier for MNIST?
And btw, itâ€™s more expressive to use an AverageMeter for the loss logs instead of only one batcheâ€™s loss.

@Unity05 Out of curiosity. I didnâ€™t get you â€śAverageMeter for the loss logsâ€ť, could you tell me a little more?

1 Like

Thatâ€™s just taking the average over all batches in that intervall.

Hi @akib62 , for good performance on MNIST use this model
with nn.NLLLoss()
You will get over 99% accuracy with learning rate=0.0001,optimizer Adam and 30 epochs.
I got 99,56% with this model
Model

``````from torch import nn
import torch
class Model(nn.Module):
def __init__(self):
super().__init__()
self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

self.conv1 = nn.Conv2d(1, 32, kernel_size=3, stride=2, padding=1)
self.activation1 = nn.ReLU()

self.conv2 = nn.Conv2d(32, 64, kernel_size=3, stride=2, padding=1)
self.activation2 = nn.ReLU()

self.conv3 = nn.Conv2d(64, 64, kernel_size=3, stride=1, padding=1)
self.activation3 = nn.ReLU()
self.conv4 = nn.Conv2d(64, 128, kernel_size=3, stride=1, padding=0)
self.activation4 = nn.ReLU()

self.linear1 = nn.Linear(128 * 5 * 5, 10)
self.soft = nn.LogSoftmax(dim=1)

def forward(self, xb):
xb = self.conv1(xb)
xb = self.activation1(xb)

xb = self.conv2(xb)
xb = self.activation2(xb)

xb = self.conv3(xb)
xb = self.activation3(xb)
xb = self.conv4(xb)
xb = self.activation4(xb)

xb = xb.reshape(-1, 128 * 5 * 5)

xb = self.linear1(xb)
xb = self.soft(xb)

return xb
``````

If you are interested here is full training loop and test of accuracy on my github, you can try

1 Like