Loss does not lower, Outputs are all the same

Hi, I’m new to PyTorch and I was trying to create a simple classifier to determine the predominance of each emotional state in a person. My data consists in 32 means of frequency bins from an FFT done on EEG data. Data has 3 labels [1,0], [0,1] and [0.5,0.5] (meaning that the first labels share predominance in the data). I’ve tried various learning rates, EPOCHS, hidden layers and batch sizes. Loss is not lowering during training, and when testing on the training data outputs are all close to [0.5,0.5]. The code is here:

class Net(nn.Module):

    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(32, 16)
        self.fc2 = nn.Linear(16, 16)
        self.fc3 = nn.Linear(16, 8)
        self.fc4 = nn.Linear(8, 2)

    def forward(self, x):
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = F.relu(self.fc3(x))
        x = self.fc4(x)

        return F.softmax(x, dim=1)


net = Net()

criterion = nn.MSELoss()
optimizer = optim.SGD(net.parameters(), lr=.001, momentum=0.9)

EPOCHS = 2

net.train()
for epoch in range(EPOCHS):
    for X, y in trainset:
        optimizer.zero_grad()
        output = net(X.view(-1, 32))
        loss = criterion(output, y)
        loss.backward()
        optimizer.step()
    print(loss)

Thanks in advance!

For classifier, you want to avoid MSELoss, and use NLL loss instead.

For optimizer, try Adam, it might be better than SGD. If you use Adam, you don’t need to change the parameter such as learning rate, the default one is supposed to be good enough.

Finally, try to use leakyRLU, which is less likely to stuck compared to RLU.

By looking at your architecture, you seem to connect a bunch of linear layers. For your specific domain, such as an emotional state classifier, you might want to search google to find the newest architecture to try. A bunch of linear layers stacked together could be good enough though.

Hi. Thanks for the reply. How can I go about using NLL loss, considering my labels are a 1 by 2 tensor? Doesn’t NLL loss expect a single value as label i.e. a class. I have 2 classes but I also have data that represents some overlapping between those two classes ([0.5,0.5] labelled data), but I dont wanna label that data as a 3rd class cause its really not.

Miguel, for this 1 by 2 tensor, is its value continuous (like [0.2, 0.35] is allowed) or discrete [0, 1], [1, 0], [0.5, 0.5]?

If it is continuous, then MSE might be better. If it is discrete, then NLL is better, you can actually just do 0, 1, 2 label for those 3 discrete values.

BTW, you might want to check the sample distribution as well. Example: if you have discrete values, but your training data is mostly [0, 1], and very few [1, 0], this imbalance could cause cost value to saturate. You can mitigate this by using a weight tensor inside nll_loss.

The labels are indeed only [0,1] [1,0] and [0.5,0.5], but I wouldn’t consider them discrete since [0.5,0.5] is not a class of its own but the equal presence of the other labels, which is why I think using 3 discrete values would be wrong in this case. The [0.5,0.5] labels are there to show the machine that both emotional states can be present resulting in neither being prevalent. I thought MSE was the best way to go about it, but I could totally be on the wrong here.
Also the sample distribution is fine with approximate the same number of data samples for each label.

I see. Well, there is some saying that deep learning is kinda like chemistry, trying out different strategies and see what works. I wouldn’t be surprised that nll lose will be helpful here.

I actually tried using nll loss with labels 0 (for [1,0]) and 1 (for [0,1]) (didnt use the [0.5, 0.5] data) and got the same results. I’m gonna redo signal processing, maybe I’m losing information while processing. Thanks for the replies though.

1 Like