Train and validation, really bad accuracy


I am working in an musical notes classificator.
I have a simple multilayer perceptron write in Pytorch. My musical notes are isolated. I am just trying to recognize one note each time.

My perceptron has an input of shape 252, and an output of 88.
In the target, if I tried to recognize the note 2, I have a tensor like:
[0,0,1,0 …,0,0,0], with 88 elements.

The train_loader returns like input, [batch_size, 252] and the target is [batch_size, 88].

The dataset is normalized, but the accuracy is really bad. (Like 9%).

I had checked all and I tryed with differents hyperparameters but the network doesn’t give me good results.

My train code is:

def train(model, train_loader, optimizer, criterion):
for idx_batch, (input,target) in enumerate(train_loader):
    input = input.cuda()
    target = target.cuda()
    outputs = model(input) 
    loss = criterion(outputs, target)
    train_loss += loss.item()
return train_loss

Each time I train, I am doing a validation test:

def val(model, val_loader, optimizer, criterion):
val_loss = 0
with torch.no_grad():
    for idx_batch, (input, target) in enumerate(val_loader):
         input = input.cuda()
         target = target.cuda()
        outputs = model(input)
        loss = criterion(outputs, target)
        val_loss += loss.item()
return val_loss

This is the code where I did the train and validation in epochs, and after, I do the test section.

    for epoch in range(1, N_EPOCHS+1):
    print("------------EPOCHS ",epoch,"------------")
    train_loss = 0
    val_loss = 0
    train_loss = train(mlp, train_loader, optimizer, criterion)
    val_loss = val(mlp, val_loader, optimizer, criterion)
    #Mean of losses
    train_loss = train_loss/len(train_loader.dataset)
    val_loss = val_loss/len(val_loader.dataset)
    #Stats training and validation.
    print('Epoch: {} \tTraining loss: {:.6f} \tValidation loss:{:.6f}'.format(epoch, train_loss, val_loss))

for data, target in test_loader:

    data ='cuda')
    target ='cuda')
    outputs = mlp(data)
    loss = criterion(outputs, target)
    test_loss += loss.item() 
    pred = torch.max(outputs, 1)[1]
    t = torch.max(target, 1)[1]
    correct += pred.eq(t).sum()
    print("Correct: {}/len(test_loader.dataset) ")

I’m new to Pytorch and maybe I’m not doing something right.

Thank you for your help.

(balamurali) #2

You mean 88 classes ? Is it a classification problem ? What loss function do you use ?



yes, it is a classification problem. I have 88 possible musical notes. And each target represents an isolate note.

My loss fuction is Binary Cross Entropy Loss, nn.BCELoss(). This is because the target is like:
[0,0,0,0,1,0,0,0, …,0]. The possition with 1 represents the note, so, in this case, the note is the number 4.

(balamurali) #4

You should be careful in selecting losses.

  1. If it is a multiclass problem use cross entropy loss.

  2. If it is a binary class problem use binary cross entropy loss.

  3. If it is a regression problem use MSE.

You should either use point 2 or point 3. BCE is not good for your task.


As you say, I’m using BCE loss, which means Binary Cross Entropy Loss, the number two, so, I am using the right function.

(balamurali) #6

Can you please explain the problem you are working on ? Forget about the DL part.


I have isolated preprocessed musical notes with a transformation function, CQT.

This function returns each note to me as a two-dimensional shape matrix (252, n), where n is variable and depends on the audio time. Each row, represents a note, so, each note is an input.

Each of these preprocessed notes are represented with a target, shape matrices (88,n), represented with 0s. Each row of targets represent the target of a row in the input data.

If my note is 0, the first row is filled with 1 and the rest with 0. And if the note is 3, the third row is filled with 1 and the rest with 0.

After this, I have concatenated all notes and all targets and inputs and transposed them.

Each input of 252 is related to a target of 88.

I try to make the neural network output learn which note is being introduced.

If something is not clear, I can try to explain it in another way.


Sorry, I said it’s a classification problem, but it is a Binary classification problem.


As far as what I understand from your problem statement is you are trying to classify the input as one of 88 notes correct ? And each input can be only one of the 88 output notes ?

So, can you change the loss function to

criterion = nn.CrossEntropyLoss()

because I see that you are trying to solve a multi class classification problem.


I think I am using the correct loss function, because my target are just 0 and 1. So it is a binary classification problem.


We have also discussed the criterion in this thread, so maybe BCELoss was working better in this use case?
Could you clarify it @Leon_Lopez?

While you can use BCELoss for your use case (you won’t get a code error), I would still recommend to use nn.CrossEntropyLoss and apply the changes discussed in the other topic.