Accuracy value error

overclock · July 2, 2018, 8:57am

Greetings I have some errors in the evaluation set in the sense that I sometimes achieve validation accuracies of 127%.
I may be an error on cuda, or something?
Thanks in advance!

def evaluation(model, loader, epoch, mini_batch_size, sequence_size):
    model.eval()
    test_loss = 0
    correct = 0
    size_input = mini_batch_size * sequence_size
    for (inputs, labels, agreement_score) in loader:
        if(inputs.size(0) == size_input):
            inputs = inputs.clone().reshape(mini_batch_size, sequence_size, inputs.size(1))
            labels = labels.clone().squeeze().reshape(mini_batch_size*sequence_size)
            agreement_score = agreement_score.clone().squeeze().reshape(mini_batch_size*sequence_size)
        else:
            (inputs, labels, agreement_score) = padd_incomplete_sequences(inputs, labels, agreement_score, mini_batch_size, sequence_size)
            
        
        inputs, labels, agreement_score = Variable(inputs.cuda()), Variable(labels.cuda()), Variable(agreement_score.cuda())
       
        
        output = model(inputs)
        
        loss = criterion(output, labels)
        loss = loss.mean()
        
        #sum up batch loss
        test_loss += loss.item()
        pred = output.data.max(1, keepdim = True)[1]
        correct += pred.eq(labels.data.view_as(pred)).cuda().sum()
        
    
    
    accuracy = 100. * correct / len(loader.dataset)
    
    validation_accuracy.append(accuracy)
    validation_loss.append(loss.item())
    validation_epochs.append(epoch)
    
    #test_loss /= len(test_loader.dataset)
    print('\nTest set: loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
    test_loss,
    correct,
    len(loader.dataset),
    accuracy))

Outputs:
Test set: loss: 1.6754, Accuracy: 329153/258686 (127%)

ptrblck · July 2, 2018, 9:03am

Could you print the shapes of output, pred and labels?

overclock · July 2, 2018, 9:16am

Sure sir,
Inputs shape : torch.Size([24, 20000, 100])
Output shape : torch.Size([480000, 6])
Labels shape : torch.Size([480000])
Agreement score shape: torch.Size([480000])

Can it be a cuda error?
I perform model training in cuda and I compute the loss and accuracy on cpu.

ptrblck · July 2, 2018, 9:22am

It looks like you are reshaping the batch dimension to mini_batch_size * seq_length, which might be the error.
The length of your Dataset will return the number of samples, while you multiply this number with the seq_length for the labels.

overclock · July 2, 2018, 9:33am

It is an implementation fault that I have written the code that way, but in this case mini_batch_size*sequence_length is the same as 24*20000.
In other words MINI_BATCH_SIZE == 24
SEQUENCE_LENGTH == 20000

The reason why I’m reshaping is for CrossEntropyLoss.
The model works well in general, but sometimes spits that high values.

ptrblck · July 2, 2018, 9:39am

I think that might be the reason your Dataset length is smaller than the calculated predictions, since your Dataset returns the length of the data using its batch dimension.
Assuming you have 240 samples, the DataLoader will return 10 batches a 24 samples.
len(loader.dataset) will also return 240.

Now if you reshape these batches to 24*20000, your correct number of samples could be much higher, that the overall length.

overclock · July 2, 2018, 10:06am

You were right sir!
I was padding the incomplete batches with zeroes and I was still computing the accuracy using the old dataset length.

Now I made the following changes:



def evaluation(model, loader, epoch, mini_batch_size, sequence_size):
    model.eval()
    test_loss = 0
    correct = 0
    padded_size = 0
    size_input = mini_batch_size * sequence_size
    for (inputs, labels, agreement_score) in loader:
        if(inputs.size(0) == size_input):
            inputs = inputs.clone().reshape(mini_batch_size, sequence_size, inputs.size(1))
            labels = labels.clone().squeeze().reshape(mini_batch_size*sequence_size)
            agreement_score = agreement_score.clone().squeeze().reshape(mini_batch_size*sequence_size)
        else:
            padded_size = size_input - inputs.size(0)
            (inputs, labels, agreement_score) = padd_incomplete_sequences(inputs, labels, agreement_score, mini_batch_size, sequence_size)
            
        
        inputs, labels, agreement_score = Variable(inputs.cuda()), Variable(labels.cuda()), Variable(agreement_score.cuda())
       
        
        output = model(inputs)
        
        
        loss = criterion(output, labels)
        loss = loss.mean()
        
        #sum up batch loss
        test_loss += loss.item()
        pred = output.data.max(1, keepdim = True)[1]
        correct += pred.eq(labels.data.view_as(pred)).cuda().sum()
        
    
    
    accuracy = 100. * correct / (len(loader.dataset) + padded_size)
    
    validation_accuracy.append(accuracy)
    validation_loss.append(loss.item())
    validation_epochs.append(epoch)
    
    #test_loss /= len(test_loader.dataset)
    print('\nTest set: loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
    test_loss,
    correct,
    (len(loader.dataset) + padded_size),
    accuracy))

You saved me sir, thanks again!