# Strange MSELoss behaviour

Hi!
I am setting up a ResNet model to make a regression problem (as I said in a previous topic). My performance is a little bit lower than expected. Right now I am using (criterion = nn.L1Loss()) but I am interested in (criterion = nn.MSELoss()) in order to add that quadratic component. However, when I make this change, I can see how the line “print(loss.item())” exponentially grows. In just a few iterations of the loop it ends up being Nan.

I have seen that this issue also happened to other people, but I couldn’t solved it since every case was totally different. I am using 300x300 (or even 200x200) images and LR = 0.1. When I decrease it to 0.03, loss numbers tend to be slightly lower, but it just takes one more iteration to go to Nan.

I don’t really know where could be the problem about this loss criterion. Below I attach the code I am running in case it helps.

``````criterion = nn.MSELoss() #Delete existing conversion

optimizer = optim.SGD(net.parameters(), lr=args.lr, momentum=0.9, weight_decay=5e-4)
scheduler = optim.lr_scheduler.MultiStepLR(optimizer, milestones=[50, 150], gamma=0.1, last_epoch=-1)

def train(epoch, net):
print('\nEpoch: {} ==> lr: {}'.format(epoch, scheduler.get_last_lr()))
net.train()
train_loss = 0

for batch_idx, (inputs, targets) in enumerate(trainloader):

inputs, targets = inputs.to(device), targets.to(device)

targets = targets.to(dtype=torch.float)
outputs = net(inputs)

loss = criterion(outputs_mean, targets)

loss.backward()
optimizer.step()

#print(loss.item())

train_loss += loss.item()
total += targets.size(0) #Related to display options
#Loss of the epoch is calculated as: (train_loss/(batch_idx+1))
``````

What is the problem you’re trying to solve? Are you trying to solve a segmentation/classification problem?
And what is `outputs_mean`? Is it a probability matrix with dimensions of [batch size, label] or is it a mask matrix with dimensions of [H, W] ? I think it would be easier to answer if you could elaborate on the question a bit more.

Ok sorry I made some changes in the code and I forgot to fix that. The initial code was based on a classifier, that’s why there were some “useless” variables that weren’t needed at all. I edited the first post to make it clearer.

The input of the nework is one single channel (Black and White) image of [H,W]. The output of the network, right now, is one single continous variable that ranges from 60 to -60 degrees (related to the orientation of a robot). Therefore, the value “outputs” from inside the loop consists of an array that depends on batch_size (8) and the number of outputs of the network (1)

Example of printing “outputs” during one iteration of the for loop with batch size = 8:
tensor([[-48.1405],
[-49.2519],
[-56.2093],
[-49.5011],
[-45.5890],
[-50.0405],
[-50.4176],