Simple Linear Model loss not decreasing

I’m trying to create my own network to train on the iris dataset. But the loss is not decreasing after 10000 iterations

import torch
import random
import numpy as np
import torch.nn as nn
from sklearn.datasets import load_breast_cancer,load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.preprocessing import OneHotEncoder
import matplotlib.pyplot as plt



data = load_iris()
X = data['data']
y = data['target']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3,random_state = 2)


X_train = torch.tensor(X_train ,dtype = torch.float32)
X_test = torch.tensor(X_test ,dtype = torch.float32)
y_train = torch.tensor(y_train ,dtype = torch.long)
y_test = torch.tensor(y_test,dtype = torch.float32)

y_train = y_train.view(-1,1)
y_test = y_test.view(-1,1)

out1 = nn.Linear(4,64)
relu1 = nn.ReLU()
out2 = nn.Linear(64,32)
relu2 = nn.ReLU()
out3 = nn.Linear(32,3)
sig = nn.Softmax()
model = torch.nn.Sequential(out1,relu1,out2,relu2,out3,sig)
loss_fn = nn.CrossEntropyLoss()
lr = 0.01
optimizer = torch.optim.Adam(model.parameters(),lr = lr)

sum_loss = 0
losses = []
for i in range(10000):
    j = random.randint(0,X_train.shape[0]-1)
    pred = model.forward(X_train[j])
    loss = loss_fn(pred.view(-1,3),y_train[j])
    optimizer.zero_grad()
    loss.backward()
    sum_loss += loss.item()
    optimizer.step()
    if i%1000 == 999:
        losses.append(sum_loss/100)
        print("Iterations: {} Loss: {}".format(i+1,sum_loss/100))
        sum_loss = 0

The output of the above code is

Iterations: 1000 Loss: 9.004105351567269
Iterations: 2000 Loss: 9.444534380435943
Iterations: 3000 Loss: 8.74448259830475
Iterations: 4000 Loss: 9.184460145831109
Iterations: 5000 Loss: 8.76445415377617
Iterations: 6000 Loss: 9.094449521303178
Iterations: 7000 Loss: 8.954448039531707
Iterations: 8000 Loss: 11.18524711728096
Iterations: 9000 Loss: 12.014446496963501
Iterations: 10000 Loss: 12.0444464969635

Is there anything I’m doing wrong? . I’m completely new to pytorch.

Hi Hari!

One error is that you are using a Softmax after your last Linear
layer, even though you are using CrossEntropyLoss for your loss.
CrossEntropyLoss has (in effect) Softmax built into it. That is, it
expects unnormalized, “raw” scores (logits) for its predictions, rather
than probabilities (as produced by Softmax).

Try your model without the final Softmax layer:

model = torch.nn.Sequential(out1,relu1,out2,relu2,out3)

(I haven’t looked at the rest of your code in any detail; there may be
other issues.)

[Edit: And another thing … It doesn’t matter, because you shouldn’t
be using Softmax, but if you were using Softmax, it should be
Softmax (dim = 1). dim = 0 is your batch dimension, while dim = 1
is your class (category) dimension. For each sample in your batch
you want your per-class probabilities summed over classes to be equal
to 1. That’s what Softmax (dim = 1) will give you.]

Good luck!

K. Frank

Hi Frank,

Thanks for the reply, I removed the softmax layer, but the issue still persists. The error is still not decreasing. Eventhough the weights are being updated the accuarcy of the model remains the same.

I tried running your model on Colab and it appears to be training OK for me. I made a few simplifications:

import torch
import random
import torch.nn as nn
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

data = load_iris()
X = data['data']
y = data['target']

X = torch.tensor(X, dtype=torch.float32)
y = torch.tensor(y, dtype=torch.long)

out1 = nn.Linear(4,64)
relu1 = nn.ReLU()
out2 = nn.Linear(64,32)
relu2 = nn.ReLU()
out3 = nn.Linear(32,3)
model = torch.nn.Sequential(out1,relu1,out2,relu2,out3)
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)

sum_loss = 0
for i in range(3000):
    j = random.randint(0, X.shape[0]-1)
    pred = model.forward(X[j])
    loss = loss_fn(pred.expand(1, 3), y[j].expand(1))
    optimizer.zero_grad()
    loss.backward()
    sum_loss += loss.item()
    optimizer.step()
    if (i % 1000 == 999):
      print(sum_loss/1000)
      sum_loss = 0

Does this work for you?

Welcome to PyTorch, by the way! You might want to check out some of the tutorials here: https://pytorch.org/tutorials/beginner/deep_learning_60min_blitz.html. I think they’re really helpful.

Yes!. It is working now. Only major changes you did are that you didn’t do the train_test_split and used expand() instead of view(). Is there anything else that could have caused this issue?

Thanks for your help

Debugging can be half the fun. You could try incrementally changing the network and see what causes it to fail.

2 Likes