Is my loss function wrong?

Oussama_Bouldjedri · February 27, 2022, 5:23pm

I was wondering if my loss function calculation is wrong :

def train(model, optimizer, train_loader, epoch,writer):
      model.train()
    
      train_loss=0
      train_accuracy=0
      correct=0
   
    
    
      train_predicted=[]
      train_target=[]
   
      for data, target in train_loader:    
     
        if use_cuda:    
            data, target = data.cuda(), target.cuda()
            
            
           
            data=data.float()
      
            
       
            optimizer.zero_grad()
        
       
            output = model(data)
    
        
        
            _, predicted = torch.max(output.data, 1)
        
        
     
       
            target = target.long() 
        
        

            target_top_numpy=target.cpu().detach().numpy()
        
            predicted_top_numpy=output.cpu().detach().numpy()
  
        
            loss = criterion(output,target)  
       
            loss.backward()
            optimizer.step()
        
            train_loss+=loss.data 
   
            predicted_cpu=predicted.cpu()
            predicted_list=predicted.tolist()
        
            target_cpu=target.cpu()
            target_list=target.tolist()
        
        
            train_predicted=train_predicted+predicted_list
        
        
            train_target=train_target+target_list
        
        
            correct +=(target==predicted).sum().item()
        
   
    
    
    #################################################################################
    
   
    
    
    
      train_accuracy=100. * correct / len(train_loader.dataset)
      print('accuracy is ',train_accuracy)
    
      train_loss /= len(train_loader.dataset) 
      return (train_accuracy,train_loss)

ptrblck · February 28, 2022, 6:15am

Your code doesn’t show any implementation of a luss function, just the usage of criterion, which looks alright assuming the input tensor shapes and dtypes are correct.

Oussama_Bouldjedri · February 28, 2022, 2:03pm

I taught the right way of doing is following one of the 2 methods explained in this post :

either I use the avereged batch loss calculated by the creterion sum them and devide by the number of batches or
use the loss of all samples of every batch sum all and devide by the size of the data set
which I was missing here ? so either i add the multiplication by the batch size inside the loop or I devide by the number of the batches outside the loop instead of the data set size ?

ptrblck · February 28, 2022, 8:19pm

Right now you are dividing the accumulated loss by the number of samples in the Dataset via:

train_loss /= len(train_loader.dataset)

which assumes train_loss is the sum of the losses of all samples.
Based on:

train_loss+=loss.data

this would be the case if your are using the sum reduction. If not (e.g. if you are using the mean reduction) you would need to scale it with the current batch size (i.e. output.size(0)).

Oussama_Bouldjedri · March 1, 2022, 4:54am

if i am using a default definition of the loss function (which by default uses mean()), assume that I can also use this instead :

train_loss /= len(train_loader) which point the way2 in the previous post you repiled to, the sum of average loss over batches devided by the number of batches ?

ptrblck · March 1, 2022, 5:54am

This would work if all batches contain the same number of samples. In case the last one has fewer samples, you would add a small error to the loss calculation as also described here.

Oussama_Bouldjedri · March 1, 2022, 6:02am

then I would go with the way1 summing the loss of all the samples along the data set and deviding by the size of the dataset, my validation cross entropy loss is high very high (1<) compared to the training one, and the validation accuracy is around 65 %, the train accuracy is 95%, this is a clear overffiting ? isnt ?
PS i have a small validation data set around 2000 same for test set , but for the train data is around 22000

ptrblck · March 1, 2022, 6:06am

Yes, due to the large gap between the training and validation accuracy your model would overfit to the training dataset.