# MNIST dataset - Why is my loss so high? [beginner]

I wanted to build a simple ANN and train it from scratch on the Mnist dataset. The accuracy values look fine as expected but the loss is just way too high, as if I was not computing it correctly. What am I missing in the code below?

``````import torch
import torchvision
import torchvision.transforms as transforms
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from itertools import product
from sklearn.metrics import accuracy_score

class Network(nn.Module):
def __init__(self):
super(Network, self).__init__()
# hidden layers
self.h1 = nn.Linear(in_features=784, out_features=196)
self.h2 = nn.Linear(in_features=196, out_features=49)
# output layer, 10 units - one for each digit
self.output = nn.Linear(49, 10)

def forward(self, x):
# (0) input layer
x = x
# (1) first hidden layer
x = self.h1(x)
x = F.relu(x)
# (2) second hidden layer
x = self.h2(x)
x = F.relu(x)
# (3) output layer
x = self.output(x)
# x = F.softmax(x, dim=1) -> already in cross-entropy loss function??
return x

def get_num_correct(preds, labels):
return preds.argmax(dim=1).eq(labels).sum().item()

train_set = torchvision.datasets.MNIST(
root='./data',
train=True,
transform=transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))]))

parameters = dict(lr = [.01], batch_size = [32, 64])
param_values = [v for v in parameters.values()]

for lr, batch_size in product(*param_values):
comment = f' batch_size={batch_size} lr={lr}'
print(comment)
network = Network()

for epoch in range(3):
total_loss = 0
total_correct = 0
all_preds = torch.Tensor([])
images, labels = batch # Get Batch
preds = network(images.unsqueeze(0).reshape(-1, 28*28*1)) # Pass batch
all_preds = torch.cat((all_preds, preds), dim=0)
loss = F.cross_entropy(preds, labels) # Calculate Loss
optimizer.step() # Update Weights

total_loss += loss.item() * batch_size # is this correct? why?
total_correct += get_num_correct(preds, labels)

print(f"""epoch: {epoch}; total loss: {total_loss};

``````

Your `total_loss` consists of the losses of all samples in your `Dataset`. Usually you print the average loss per sample. To do this, you could divide `total_loss` by `len(train_set)`.

1. `nn.CrossEntropy` combines `log_softmax` and `NLLLoss`, that’s the reason there is no need to use `nn.Softmax` at the end of the network when `nn.CrossEntropy` is used as a criterion.
2. Generally average loss over the entire batch is returned after `criterion(outputs, targets)` call. So inorder to get the sum of all losses over a batch, you do `loss.item() * batch_size`.