The accuracy of the Model is constant

I have a problem with multi-classification I built this Model, but the test accuracy is constant at 4

input_size = 13

hidden1_size = 1024

hidden2_size = 1024

hidden3_size = 1024

hidden4_size = 1024

hidden5_size = 1024

output_size = 1976

class DNN(nn.Module):

def __init__(self, input_size, hidden1_size, hidden2_size, hidden3_size, hidden4_size, hidden5_size, output_size):
    super(DNN, self).__init__()
    self.fc1 = nn.Linear(input_size, hidden1_size)

    self.sig1 = nn.Sigmoid()
    self.fc2 = nn.Linear(hidden1_size, hidden2_size)

    self.sig2 = nn.Sigmoid()
    self.fc3 = nn.Linear(hidden2_size, hidden3_size)

    self.sig3 = nn.Sigmoid()
    self.fc4 = nn.Linear(hidden3_size, hidden4_size)

    self.sig4 = nn.Sigmoid()
    self.fc5 = nn.Linear(hidden4_size, hidden5_size)

    self.sig5 = nn.Sigmoid()
    self.fc6 = nn.Linear(hidden5_size, output_size)

def forward(self, x):
    out = self.fc1(x)

    out = self.sig1(out)
    out = self.fc2(out)

    out = self.sig2(out)
    out = self.fc3(out)

    out = self.sig3(out)
    out = self.fc4(out)

    out = self.sig4(out)
    out = self.fc5(out)
  
    out = self.sig5(out)
    out = self.fc6(out)

    return out

model = DNN(input_size, hidden1_size, hidden2_size, hidden3_size, hidden4_size, hidden5_size,
output_size)

criterion = nn.CrossEntropyLoss()

learning_rate = 0.008

optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

for epoch in range(1, 50):

for i, (X_train, y_train) in enumerate(train_loader):

    model.train()

    optimizer.zero_grad()

    outputs = model(Variable(X_train))

    loss = criterion(outputs, Variable(y_train))

    print('Iter %d/%d --> loss %f' % (i, len(train_loader), loss.item()))
    
    loss.backward()
    optimizer.step()
correct = 0
total = 0
print('test')
for X_test, y_test in test_loader:
    model.eval()
    out = model(Variable(X_test)).detach()
    pred = out.max(dim=1)[1]#.argmax(dim=1, keepdim=True)
    total += y_test.size(0)
    correct += (pred.squeeze() == y_test).sum() # pred.eq(y_test.view_as(pre  d)).sum().item()
accuracy = 100 * correct / total
print('epoch: {}.  Accuracy: {}'.format(epoch, accuracy))

Try to overfit a small data sample (e.g. just 10 samples) to make sure you don’t have any hidden bugs in your code and that your model architecture works for this problem.
From my past experience I would claim, that relu activation functions might work better than sigmoids, so you could play around with the architecture and some hyperparameters.

@ptrblck

The problem is the network always estimates the most frequent class in the labels.
So, Do you have any idea to tackle this problem

In addition to using relu as activation, u should also add some dropouts. U can try to add weight to CrossEntropyLoss to reduce the over fitting caused by imbalanced data.

@G.M
Thanks a lot, but can you give me an example how can I add weights to BCE Loss

Aren’t u using CrossEntropyLoss? I think u should use cross entropy for multi-class classification.
For the ordinary BCELoss, according to here, the weight are just a number that is multiplied to each value in a batch, so the shape of weight must equal to the shape of a single batch.

>>> import torch as tc
>>> from torch import nn
>>> bsz = 10
>>> loss0 = nn.BCELoss(weight = tc.full([bsz], 0.5))  # "weight" can contain values of any number.
>>> loss1 = nn.BCELoss()
>>> inp, tar = tc.zeros(bsz), tc.ones(bsz)
>>> loss0(inp, tar)
tensor(13.8155)
>>> loss1(inp, tar)
tensor(27.6310)

@G.M
Sorry for the typo I already use CrossEntropyLoss so, can you edit the example according to CrossEntropy loss

That’s ok :slight_smile:. For CrossEntropy, it’s straightforward: provide a tensor of shape [C]( C is the number of classes, and the id of the classes ranges from [0, C) ). Each value represent the weight of each class, the weight here should be positive. For example:

from torch import nn
import torch as tc
num_cls = 100
weights = tc.rand([num_cls])
loss = nn.CrossEntropyLoss(weight = weights)

@G.M
Thanks a lot, and this weight should be connected with any layers of the model or just implemented as you tell me.

Another thing if I want to accelerate my model I use ReLU activation function and dropout layers and Increase the hidden layers this makes the loss to decrease and the accuracy increase but slowly due you have ideas how can I increase them rapidly

  1. Usually, the weight of a class is something like the inverse of the frequency of the class.
  2. I think it is slow because dropout makes the model converges slower; my suggestion is to remove some Linearn layers. Removing some layers can accelerate the training process, reduce memory usage, and reduce over-fitting. Currently, u have 6 Linear layers, 2 or 3 should be enough :slight_smile:.