Why am I getting same output values for every single data in my ann model for multi-class classification?

Vairagi · October 9, 2019, 7:46am

In my code, i am taking a random array as a dataset. Each row of array has 4 values, and each row is one data. So if total no. of rows is suppose, 10000, then i have 10,000 data.
The task is to feed one row at a time to the model:
input layer- has 4 nodes for the 4 values in each row.
no. of hidden layers- 2 (for now)
output layer has 3 nodes for 3 classes.
Class labels are 0,1,2.

Upon training, each output contains 3 probabilities but all 3 of them have values in same range, i.e 0.3… . I dont understand why?
Also, in validation testing, the output of all data values(each row) is same.
I have tried many variations, but now i dont understand why am i getting same output each time.
i have pasted the code here:

import torch
import torch.nn as nn
#import torch.nn.functional as f
import torch.optim as optim
#from torch.autograd import Variable
from sklearn.utils import shuffle
from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score
from sklearn.metrics import classification_report
from sklearn.preprocessing import OneHotEncoder
from sklearn.preprocessing import LabelEncoder 

import numpy as np
import time

#_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _DEFINING DATA_ _ _ __ _ _ _ _ _ __ _ __

arr1=np.random.rand(22500,4)    #random array with 22500 rows of values; (demo dataset)
lbl2=np.ones(7500)              #matrix of 0s for class 1
lbl1=np.zeros(7500)             #matrix of 1s for class 2
lbl3=np.full(7500,2)            #matrix of 2s for class 3
lbl=np.hstack((lbl1,lbl2,lbl3))
label_encode=LabelEncoder()
int_encode=label_encode.fit_transform(lbl)
onehot_encoder=OneHotEncoder(categories='auto',sparse=False)
lbl=lbl.reshape(len(lbl),1)
lbl=onehot_encoder.fit_transform(lbl)
print(lbl) 


#divide the dataset arr1 into 3 parts: 20% for test set, 20% of remaining for validation set and rest is training set
percent=0.2
data1,label1=shuffle(arr1,lbl)             #shuffle the data values before partition
no=int(round(percent*len(lbl)))            #no stores 20% of data1
#print("no.of elements after first cut: ",no)

#test_input - stores 20% of data1 as test data1
#test_label - stores corresponding labels of data points in test_input
#t_input - stores remaining 80% of data points
#t_label - stores remaining corresponding 80% class labels
#valid_input - stores 20% of remaining data as validation dataset
# valid_label - stores corresponding validation set labels
#train_input - stores training dataset
#train_label - stores training labels

test_input,t_input=data1[:no],data1[no:]
test_label,t_label=label1[:no],label1[no:]
noele=int(round(percent*len(t_input)))       #stores 20% of remaining data
#print("no.of elements after second cut: ",noele)
valid_input,train_input=t_input[:noele],t_input[noele:]
valid_label,train_label=t_label[:noele],t_label[noele:]

print("size of training set: {} \n size of test set: {} \n size of validation set: {}".format(len(train_input),len(test_input),
        len(valid_input)))


#converting all datasets into torch tensors
trdata=[torch.tensor([line],dtype=torch.float) for line in train_input]
trlabel=[torch.tensor([line],dtype=torch.float) for line in train_label]
vdata=[torch.tensor([line],dtype=torch.float) for line in valid_input]
vlabel=[torch.tensor([line],dtype=torch.float) for line in valid_label]
tedata=[torch.tensor([line],dtype=torch.float) for line in test_input]
telabel=[torch.tensor([line],dtype=torch.float) for line in test_label]
#print("type of label is: ",vlabel[0].dtype)
#print("type of training label is: ",trlabel[0].dtype)
#print("type of validation dataset is: " ,type(vlabel))
#print("type of training dataset is: ", type(trdata))

#_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _DEFINING THE NETWORK_ _ _ _ _ _ _ ___ __ _ _ _ _ _ _ _ _ _ 

#Two layers network class
class Network(nn.Module):
    def __init__(self):
        super(Network,self).__init__()
        self.block1=nn.Sequential(nn.Linear(4,20),
                                   #nn.BatchNorm1d(20),
                                   nn.Dropout(p=0.25),
                                   nn.ReLU(),
                                   nn.Linear(20,20),
                                   #nn.BatchNorm1d(20),
                                   nn.Dropout(p=0.25),
                                   nn.ReLU())
        self.block2=nn.Sequential( nn.Linear(20,20),
                                   #nn.BatchNorm1d(20),
                                   nn.Dropout(p=0.25),
                                   nn.ReLU(),
                                   nn.Linear(20,3),
                                   nn.Softmax(dim=1))
    #forward pass
    def forward(self,x):
        x=self.block1(x)
        output=self.block2(x)
        #return f.softmax(x,dim=1)
        return output

net=Network()
#state CrossEntropyLoss as loss function; stochastic gradient descent as optimizer,
#learning rate, momentum and weight decay for regularization
criterion=nn.BCEWithLogitsLoss()
opt=optim.SGD(net.parameters(),lr=0.001,momentum=0.9,weight_decay=0.1)



#_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _TRAINING DATA_ _ _ _ _ _ _ _ _ _ _ __ _ _ _ _ _ _ _ _ _

#train() computes loss over a single data passed to it.
def train(input,target):
    output=net(input)

    print("training data is : ",input)
    print("training output is : ",output[0])
    print("\n")

    loss=criterion(output,target)
    #print("loss is : ",loss)
    opt.zero_grad()
    loss.backward()
    opt.step()
    return loss.item()

def trainer(epoch,data,target):
    net.train()
    #batch_loss stores sum of loss values computed for each data in the entire
    #data set in 1 epoch.
    batch_loss=0.0
    for iter in range(totalbatches):
        #creating batches of size batch_size from the data.
        data1=data[iter*batch_size:(iter+1)*batch_size]
        target1=target[iter*batch_size:(iter+1)*batch_size]
        #print("in trainer : ",target1[0].dtype)
        #losses stores loss of each data from a batch; returned by train()
        losses=[]
        for i in range(len(data1)):
            losses.append(train(data1[i],target1[i]))
        #print("loss for data {} :".format(i+1),loss)
        batch_loss+=sum(losses)
        #print('[%d/%d %4d] train loss: %.5f'%(epoch+1,epochs,(iter+1)*batch_size,sum(loss)/batch_size))
        print("batch %d of epoch %d successful"%(iter+1,epoch+1))
    return(batch_loss/len(data))        #return avg error over entire dataset in 1 epoch

#_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _VALIDATION TESTING_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __ 

def validation(epoch,vdata,vtarget):
    net.eval()
    cor=0                   #no. of correct predictions
    incor=0                 #no. of incorrect predictions
    valid_loss=0.0          #stores sum of losses over entire dataset
    op=[]                   #stores predicted class of each data
    tar=[]                  #stores target class of each data
    #print(("before getting inside loop, type is: ",vtarget[0].dtype))
    for iter in range(vbatches):
        vdata1=vdata[iter*batch_size:(iter+1)*batch_size]
        vtarget1=vtarget[iter*batch_size:(iter+1)*batch_size]
        #print(("inside loop, type is: ",vtarget1[0].dtype))
        losses=[]           #stores loss of every data in 1 batch
        for i in range(len(vtarget1)):
            output=net(vdata1[i])
            print("validset data \t:",vdata1[i])
            print("validset output \t:",output)
            print("\n")
            op.append(torch.argmax(output).item())
            tar.append(torch.argmax(vtarget1[i]).item())
            loss=criterion(output,vtarget1[i])
            losses.append(loss)

        valid_loss+=sum(losses)
        #print('[%d %5d] valid loss: %.5f'%(epoch+1,(iter+1)*batch_size,(valid_loss/len(vdata)))
    #compute total number of correct/incorrect predictions
    for i in range(len(op)):
        if op[i]==tar[i]:
            cor+=1
        else:
            incor+=1
    valid_acc=(cor/(cor+incor))*100
    #return avg loss in validation data and validation accuracy over 1 epoch
    return ((valid_loss.item())/len(vdata)),valid_acc

#_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __ TESTING DATA_ _ _ _ _ _ _ _ _ _ __ _  __ _ _ _ __ _ __

def test(tdata,tlabel):
    net.eval()
    cor=0
    incor=0
    test_op=[]                      #list to store output predictions
    test_tar=[]                     #list to store target values
    error=[]                        #stores error in class prediction of each data
    #print("label is: ")
    #print(tlabel)
    for batch in range(tbatches):
        tdata1=tdata[batch*batch_size:(batch+1)*batch_size]
        tlabel1=tlabel[batch*batch_size:(batch+1)*batch_size]
        for i in range(batch_size):
            output=net(tdata1[i])

            #print("test data is: ", tdata1[i])
            #print("test output is :", output[0])
            #print("\n")

            test_op.append(torch.argmax(output[0]).item())
            test_tar.append(torch.argmax(tlabel1[i]).item())
            error.append(criterion(output,tlabel1[i]))
        print("Batch {} testing done ".format(batch+1))
    avg_error=(sum(error)/len(tdata))    #avg error computed for test data
    #compute total correct/incorrect predictions.
    for i in range(len(test_op)):
        if(test_op[i]==test_tar[i]):
            cor+=1
        else:
            incor+=1
    results=confusion_matrix(test_tar,test_op)
    print("confusion matrix:")
    print(results)
    print("acccuracy score: ")
    print(accuracy_score(test_tar,test_op,normalize=False))
    print("report :")
    print(classification_report(test_tar,test_op))
    print("average error in test set is: {}".format(avg_error))
    print("test accuracy is : {} %".format((cor/(cor+incor))*100))


# _ _ _ _ _ _ _ _ _ _ __ _ _ _ _ _ _ _DRIVER CODE_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __ _ __ _ 

epochs=6
batch_size=800

trainlen=len(trdata)                    #size of training data
validlen=len(vdata)                     #size of validation data
testlen=len(tedata)                     #size of test data
totalbatches=int(trainlen/batch_size)   #total training batches
vbatches=int(validlen/batch_size)       #total validation batches
tbatches=int(testlen/batch_size)        #total test batches

t=time.time()
#train_loss stores avg loss computed over training set in every epoch
#valid_loss stores avg loss computed over validation set in every epoch
train_loss=[]
valid_loss=[]

for epoch in range(epochs):
    tloss=0.0                       #stores avg training loss returned in 1 epoch
    vloss=0.0                   #stores avg validation loss returned in 1 epoch
    #indices=torch.randperm(trainlen)
    #print(indices)
    #data,label=data[indices],label[indices]
    trdata,trlabel=shuffle(trdata,trlabel,random_state=0)
    #print(len(trdata))
    vdata,vlabel=shuffle(vdata,vlabel,random_state=0)
    t0=time.time()
    tloss=trainer(epoch,trdata,trlabel)
    vloss,valid_acc=validation(epoch,vdata,vlabel)
    #print("epoch {} successful : {} seconds ".format(epoch+1,round((time.time()-t0),3)), end=' ')
    #print("train loss: {}".format(round(tloss,5)))
    #print("validation loss: {}".format(round(vloss,5)))


    print("epoch %d/%d took %.5f seconds" %((epoch+1),epochs,round((time.time()-t0),3)))
    print("training loss: ",round(tloss,5))
    print("validation loss :",round(vloss,5))
    print("validation accuracy is: {}%".format(valid_acc))
    train_loss.append(tloss)
    valid_loss.append(vloss)


print("total training time is {} minutes : ".format(round((time.time()-t)/60,3)))
#print("train loss \t validation loss: ")
#for i in range(len(train_loss)):
#    print(round(train_loss[i],5),'\t', round(valid_loss[i],5)) 

#print(vlabel)



#testing data
#test(tedata,telabel)

Here is the validation result:

validset data   : tensor([[0.4604, 0.5948, 0.1612, 0.2271]])
validset output         : tensor([[0.3381, 0.3278, 0.3341]], grad_fn=<SoftmaxBackward>)


validset data   : tensor([[0.7745, 0.0985, 0.1097, 0.4409]])
validset output         : tensor([[0.3381, 0.3278, 0.3341]], grad_fn=<SoftmaxBackward>)


validset data   : tensor([[0.2225, 0.8885, 0.1667, 0.9272]])
validset output         : tensor([[0.3381, 0.3278, 0.3341]], grad_fn=<SoftmaxBackward>)


validset data   : tensor([[0.4683, 0.1766, 0.2229, 0.3912]])
validset output         : tensor([[0.3381, 0.3278, 0.3341]], grad_fn=<SoftmaxBackward>)


validset data   : tensor([[0.7469, 0.3891, 0.7210, 0.6673]])
validset output         : tensor([[0.3381, 0.3278, 0.3341]], grad_fn=<SoftmaxBackward>)


validset data   : tensor([[0.5424, 0.6587, 0.4722, 0.3075]])
validset output         : tensor([[0.3381, 0.3278, 0.3341]], grad_fn=<SoftmaxBackward>)


validset data   : tensor([[0.3186, 0.5349, 0.9189, 0.7124]])
validset output         : tensor([[0.3381, 0.3278, 0.3341]], grad_fn=<SoftmaxBackward>)


validset data   : tensor([[0.7580, 0.7256, 0.3102, 0.7087]])
validset output         : tensor([[0.3381, 0.3278, 0.3341]], grad_fn=<SoftmaxBackward>)


validset data   : tensor([[0.1943, 0.7328, 0.1706, 0.8615]])
validset output         : tensor([[0.3381, 0.3278, 0.3341]], grad_fn=<SoftmaxBackward>)


validset data   : tensor([[0.1238, 0.6982, 0.0876, 0.9440]])
validset output         : tensor([[0.3381, 0.3278, 0.3341]], grad_fn=<SoftmaxBackward>)

Also, i changed the loss function to CrossEntropyLoss() and used class labels as 0,1,2. But the result was same.

PLz help.

Oli · October 9, 2019, 9:51am

I’ve gotten similar results once. In my case it was because the learning rate was so high, everything diverged and the model just gave equal probabilities for all outputs. I’d advise you to try to overfit to a small training set with a low learning rate, 10⁻5 perhaps

mailcorahul · October 9, 2019, 10:04am

If you’re using nn.CrossEntropyLoss as criterion, do not apply softmax at the end of your network.
nn.CrossEntropy is a combination of nn.LogSoftmax and nn.NLLLoss - which will internally apply softmax and then negative log loss.

Vairagi · October 9, 2019, 10:15am

Thank you for the response. earlier, i had implemented the code after removing softmax(). But the resulting outputs were still same for all data values.

validset data   : tensor([[0.4765, 0.6959, 0.0277, 0.4108]])
validset output         : tensor([[-0.0545, -0.0238,  0.0783]], grad_fn=<AddmmBackward>)


validset data   : tensor([[0.6540, 0.9209, 0.4184, 0.8473]])
validset output         : tensor([[-0.0545, -0.0238,  0.0783]], grad_fn=<AddmmBackward>)


validset data   : tensor([[0.1009, 0.3274, 0.9295, 0.5534]])
validset output         : tensor([[-0.0545, -0.0238,  0.0783]], grad_fn=<AddmmBackward>)


validset data   : tensor([[0.3595, 0.9765, 0.8850, 0.2885]])
validset output         : tensor([[-0.0545, -0.0238,  0.0783]], grad_fn=<AddmmBackward>)


validset data   : tensor([[0.6617, 0.3173, 0.8160, 0.5156]])
validset output         : tensor([[-0.0545, -0.0238,  0.0783]], grad_fn=<AddmmBackward>)


validset data   : tensor([[0.8830, 0.8857, 0.8681, 0.1318]])
validset output         : tensor([[-0.0545, -0.0238,  0.0783]], grad_fn=<AddmmBackward>)


validset data   : tensor([[0.9223, 0.7067, 0.8740, 0.3126]])
validset output         : tensor([[-0.0545, -0.0238,  0.0783]], grad_fn=<AddmmBackward>)


validset data   : tensor([[0.1338, 0.2998, 0.2755, 0.2248]])
validset output         : tensor([[-0.0545, -0.0238,  0.0783]], grad_fn=<AddmmBackward>)

Vairagi · October 9, 2019, 10:54am

i reduced the learning rate to 10^-5 and the outputs are now different. Thank you. but the problem is that in the o/p, the highest probability always occurs at one index only, say 2. So the predicted class obtained is 2 for all data. I’d run several epochs but am getting same class value. Kindly suggest what should i do ?

validset data   : tensor([[0.1263, 0.9880, 0.8812, 0.5894]])
validset output         : tensor([[0.0481, 0.0847, 0.1271]], grad_fn=<AddmmBackward>)
target output is:        tensor([1])
label output is:         2
label target is:         1


validset data   : tensor([[0.2822, 0.1722, 0.3539, 0.0999]])
validset output         : tensor([[0.0201, 0.0679, 0.1299]], grad_fn=<AddmmBackward>)
target output is:        tensor([1])
label output is:         2
label target is:         1


validset data   : tensor([[0.1996, 0.1763, 0.7774, 0.9449]])
validset output         : tensor([[0.0392, 0.0913, 0.1218]], grad_fn=<AddmmBackward>)
target output is:        tensor([2])
label output is:         2
label target is:         2


validset data   : tensor([[0.3247, 0.0582, 0.7237, 0.0739]])
validset output         : tensor([[0.0199, 0.0653, 0.1324]], grad_fn=<AddmmBackward>)
target output is:        tensor([0])
label output is:         2
label target is:         0


validset data   : tensor([[0.4353, 0.9065, 0.1725, 0.4158]])
validset output         : tensor([[0.0432, 0.0784, 0.1270]], grad_fn=<AddmmBackward>)
target output is:        tensor([2])
label output is:         2
label target is:         2


validset data   : tensor([[0.9252, 0.5795, 0.1595, 0.7732]])
validset output         : tensor([[0.0479, 0.0946, 0.1213]], grad_fn=<AddmmBackward>)
target output is:        tensor([2])
label output is:         2
label target is:         2

output: 2        target: 0
output: 2        target: 0
output: 2        target: 1
output: 2        target: 0
output: 2        target: 2
output: 2        target: 1
output: 2        target: 1
output: 2        target: 1
output: 2        target: 0
output: 2        target: 1
output: 2        target: 2
output: 2        target: 1
output: 2        target: 1
output: 2        target: 0
output: 2        target: 2
output: 2        target: 0
output: 2        target: 2
output: 2        target: 1

Oli · October 9, 2019, 11:04am

To reiterate what I said before. A common technique is to see if your model is capable of overfitting to the training set. So try to train and validate on the training data to assert that the model outputs different results.

Once that is working, I got to raise the validity of using random input data. What are you hoping to find out from the experiment?

Vairagi · October 9, 2019, 11:29am

The task is to classify a multi-spectral image(4 bands) into 3 classes. and i had to do pixel-based classification. So, the structure of the code that i pasted here is the exact replica of my original code except for the data which is a random array here.
I was stuck with these similar outputs and same predicted class label. So, i tried doing it on a random array first to correct the errors.
If i get same output class labels on all the training data too, what changes should be made in the model?
Could u plz suggest what factors should be modified to get different results. That’ll be really helpful.

Oli · October 9, 2019, 11:40am

I don’t have the time to go through all your code sorry man. I guess that the training is weird. Limit your training data to only e.g. 10 batches and print the loss after loss=criterion(output,target). If it doesn’t go down, it means that your data/labels aren’t set up properly I believe

GL!

Vairagi · October 9, 2019, 12:03pm

Yeah i understand. Thank u so much

User4 · December 20, 2020, 4:16am

@Oli Hi, can you talk a little bit more about “data/labels aren’t set up properly”. What did you mean? And how can I set my data as you said, “properly”?

Oli · December 20, 2020, 10:30am

Hi, maybe my wording was a bit weird. What I meant was that it’s good to verify that the model input data matches with its label. If you’re dealing with images it’s quite easy to show a few images with its label just before you put it into the model.

User4 · December 20, 2020, 10:37am

@Oli, ah I see, anyway, I meet the same problem but I figure out the problem is because I used BatchNorm1d before Linear layer, maybe it scale the output too much so that the final Linear scale it up to nearly similar value.

Anyway, have you ever try the CNNs model for a Regression problem, I try to train some models but it seem like they tend to return the variable in a fixed range (so it will miss some too big or too small targets)

Oli · December 20, 2020, 1:08pm

Hi, yes I’ve tried regression problems for cnns. If your output is bounded you can simply scale the outputs from the model. This would allow the model to keep its small weights which is good for L2-weight regularisation. Some solutions could be linear scaling, log/exp scaling or predicting the offset from a typical value.

output = model(input) * 100 # Linear scaling
output = model(input) + guess_output # Add expected value

I haven’t done this in a while so I’d advise you to look for second opinions on google Good luck!

Chloe_Su · January 27, 2022, 4:31pm

may I ask what do you mean by ‘apply softmax at the end of your network’