CrossEntropyLoss for sequences - Loss and Accuracy calculation

class MyDataClassification(nn.Module):
def init(self, ):
super(MyDataClassification, self).init()

    self.layer_1a = torch.nn.Conv1d(in_channels = ch1, out_channels = 32, kernel_size = 4, stride=1)
    self.relu = nn.ReLU()
    self.layer_2a = torch.nn.Conv1d(in_channels = 32, out_channels = 16, kernel_size = 3, stride=1)
    self.relu = nn.ReLU()
    self.layer_3a = torch.nn.Conv1d(in_channels = 16, out_channels = 1, kernel_size = 2, stride=1)
    self.relu = nn.ReLU()
    
    self.layer_1b = torch.nn.Conv1d(in_channels = ch2, out_channels = 32, kernel_size = 4, stride=1)
    self.relu = nn.ReLU()
    self.layer_2b = torch.nn.Conv1d(in_channels = 32, out_channels = 16, kernel_size = 3, stride=1)
    self.relu = nn.ReLU()  
    self.layer_3b = torch.nn.Conv1d(in_channels = 16, out_channels = 1, kernel_size = 2, stride=1)
    self.relu = nn.ReLU()   
    
    self.layer_3 = nn.Linear(whatever_value_makes_this_work, seq_len)
    
    self.dropout = nn.Dropout(p=0.2)

    
    def forward(self, x1, x2):

    x1 = self.layer_1a(x1)
    x1 = self.layer_2a(x1)
    x1 = self.layer_3a(x1)
    x2 = self.layer_1b(x2)
    x2 = self.layer_2b(x2)
    x2 = self.layer_3b(x2)
    
    x = torch.add(x1, x2)
    x = torch.flatten(x,start_dim=2, end_dim=-1)
    x = self.layer_3(x)
    x = self.layer_4(x)
    
    return x

My input x1 is a tensor of shape [batch_size, ch1 ,seq_len] and x2 is of shape [batch_size, ch2, seq_len]
My target is of shape [batch_size, seq_len]
The output of the above model is [batch_size, no_of_classes, seq_len].

I am using CrossEntropyLoss( ) and the model seems to be training.
But when I print the loss and accuracy using

print(f’Epoch {e+0:03}: | Train Loss: {train_epoch_loss/len(train_loader.dataset):.5f} | Val Loss: {val_epoch_loss/len(val_loader.dataset):.5f} | Train Acc: {train_epoch_acc/len(train_loader.dataset):.3f}| Val Acc: {val_epoch_acc/len(val_loader.dataset):.3f}’)

The values are accuracy extremely high! In thousands or ten thousands!! What is going wrong? Am I making some error that I am not aware of here?

It seems the accuracy calculation is wrong, so could you post the corresponding code and explain how these values are calculated?

1 Like

This is how I calculate the loss

    y_train_pred = model(X1_train_batch, X2_train_batch)
    train_loss = criterion(y_train_pred, y_train_batch)
    train_acc = multi_acc(y_train_pred, y_train_batch)
    
    train_loss.backward()
    optimizer.step()
    
    train_epoch_loss += train_loss.item()
    train_epoch_acc += train_acc.item()

where

def multi_acc(y_pred, y_test):
    y_pred_softmax = torch.log_softmax(y_pred, dim = 1)
    _, y_pred_tags = torch.max(y_pred_softmax, dim = 1)    

    correct_pred = (y_pred_tags == y_test).float()
    acc = correct_pred.sum() / len(correct_pred)

    acc = torch.round(acc * 100)

    return acc

And the loss function used is CrossEntropyLoss( ) as mentioned.

This the snippet I usually use to print the values.

    print(f'Epoch {e+0:03}: | Train Loss: {train_epoch_loss/len(train_loader):.5f} | Val Loss: {val_epoch_loss/len(val_loader):.5f} | Train Acc: {train_epoch_acc/len(train_loader):.3f}| Val Acc: {val_epoch_acc/len(val_loader):.3f}')

It seems that multi_acc is returning the accuracy (in %) for each batch and the training loop accumulates it. Later you are then dividing by the number of samples.
An example run for a 3 batches and 30 samples would thus be:

train_epoch_acc = 90 + 80 + 70 # returned by multi_acc
train_epoch_acc/len(train_loader) = 240 / 3 = 80

so it looks alright assuming all batches contain the same number of samples (otherwise you would add a bias to the calculation).
Note that you’ve previously divided by len(train_loader.dataset), which gives the number of samples, while len(train_loader) returns the number of batches.

1 Like

Yes, so there is nothing wrong with the calculation while using train_loader.dataset right? I shouldn’t be getting accuracy rate in thousands. Do you have any idea of where it could be going wrong?

No, using len(train_loader.dataset) would be wrong as described before, since you would normalize by the number of samples.
Add print debug statements and check the intermediate values in the same way I’ve tried to explain your workflow.

1 Like

Sorry. Yeah, that makes more sense. But I still end up with values like 27855.0 for train_epoch_acc/len(train_loader) :sweat: