Keras and pytorch CNN2d gives different output shape

Geerthy1130 · July 7, 2023, 2:42pm

Thank you. I have attached the results of 200 epochs for fewer samples. From the graph, can I conclude there’s no issue with the model? Also, if we get best model at 8th iteration why we have to train the model for many epochs until it converges to zero?

Thank you very much

ptrblck · July 7, 2023, 6:05pm

If your model isn’t even able to overfit a tiny subset (e.g. just 10 samples) of the dataset, you might still have errors in your training code. Right now the training loss is stuck and your model still yields wrong predictions for the small subset.

Geerthy1130 · July 10, 2023, 2:40pm

Hi! Thank you for your suggestions. I have used nn.BCEWithLogitsLoss as my loss function, training loss struck at 0.3 and when I use nn.CrossEntropyLoss it gives me the following graph (converging at zero).

Can we not use both loss for binary classification?
The validation accuracy is very poor (for fewer samples: 50) though the training loss is converging at zero. What would be the reason?

loss_cross712×568 61.8 KB

I have also attached the code for your suggestions to improve the validation accuracy.

learning_rate=0.0001
criterion =nn.CrossEntropyLoss()  #nn.BCEWithLogitsLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate, betas=[0.9,0.999], amsgrad=False)

class EarlyStopping:
    """Early stops the training if validation loss doesn't improve after a given patience."""
    def __init__(self, patience=7, verbose=False, delta=0, path='checkpoint.pt', trace_func=print):
        """
        Args:
            patience (int): How long to wait after last time validation loss improved.
                            Default: 7
            verbose (bool): If True, prints a message for each validation loss improvement. 
                            Default: False
            delta (float): Minimum change in the monitored quantity to qualify as an improvement.
                            Default: 0
            path (str): Path for the checkpoint to be saved to.
                            Default: 'checkpoint.pt'
            trace_func (function): trace print function.
                            Default: print            
        """
        self.patience = patience
        self.verbose = verbose
        self.counter = 0
        self.best_score = None
        self.early_stop = False
        self.val_loss_min = np.Inf
        self.delta = delta
        self.path = path
        self.trace_func = trace_func
    def __call__(self, val_loss, model):

        score = -val_loss

        if self.best_score is None:
            self.best_score = score
            self.save_checkpoint(val_loss, model)
        elif score < self.best_score + self.delta:
            self.counter += 1
            self.trace_func(f'EarlyStopping counter: {self.counter} out of {self.patience}')
            if self.counter >= self.patience:
                self.early_stop = True
        else:
            self.best_score = score
            self.save_checkpoint(val_loss, model)
            self.counter = 0

    def save_checkpoint(self, val_loss, model):
        '''Saves model when validation loss decrease.'''
        if self.verbose:
            self.trace_func(f'Validation loss decreased ({self.val_loss_min:.6f} --> {val_loss:.6f}).  Saving model ...')
        torch.save(model.state_dict(), self.path)
        self.val_loss_min = val_loss


def train_model(model, batch_size, patience, n_epochs):
    
    # to track the training loss as the model trains
    train_losses = []
    # to track the validation loss as the model trains
    valid_losses = []
    # to track the average training loss per epoch as the model trains
    avg_train_losses = []
    # to track the average validation loss per epoch as the model trains
    avg_valid_losses = []
    
    # initialize the early_stopping object
    early_stopping = EarlyStopping(patience=patience, verbose=True)
    
    for epoch in range(1, n_epochs + 1):

        ###################
        # train the model #
        ###################
        model.train() # prep model for training
        for batch, (features,label) in enumerate(train_dataloader):
            features = features.unsqueeze(1)    
            features = features.to(device)
                 #(32,1,12,301)
            label = label.to(device)
            # Clear the gradients
            optimizer.zero_grad()
            # Forward Pass
            target = model(features)
            # Find the Loss
            loss = criterion(target, label)
            # Calculate gradients
            loss.backward()
            # Update Weights
            optimizer.step()
            # Calculate Loss
            train_losses.append(loss.item())
            _, predicted = torch.max(target, 1)
            actual = torch.argmax(label, dim=1) #torch.max(label, 1)
            correct_t = (predicted == actual).sum().item()
            accuracy_train = 100 * correct_t / target.shape[0]
        ######################    
        # validate the model #
        ######################
        model.eval() # prep model for evaluation
        for features,label in valid_dataloader:
            features = features.unsqueeze(1)    
            features = features.to(device)
            # forward pass: compute predicted outputs by passing inputs to the model
            output = model(features)
            label_v = label.to(device)
            # Find the Loss
            loss = criterion(output,label)
            # Calculate Loss
            valid_losses.append(loss.item())
            _, predicted_v = torch.max(output, 1)
            actual_v = torch.argmax(label_v, dim=1) #torch.max(label, 1)
            correct_v = (predicted_v == actual_v).sum().item()
            accuracy_valid = 100 * correct_v / target.shape[0]
        # print training/validation statistics 
        # calculate average loss over an epoch
        train_loss = np.average(train_losses)
        valid_loss = np.average(valid_losses)
        avg_train_losses.append(train_loss)
        avg_valid_losses.append(valid_loss)
        
        epoch_len = len(str(n_epochs))
        
        print_msg = (f'[{epoch:>{epoch_len}}/{n_epochs:>{epoch_len}}] ' +
                     f'train_loss: {train_loss:.5f} '   +     
                     f'Accuracy_train :{accuracy_train :.5f}'  +
                     f'valid_loss: {valid_loss:.5f}'    +      
                     f'Accuracy_valid :{accuracy_valid :.5f}')
        
        print(print_msg)
        
        # clear lists to track next epoch
        train_losses = []
        valid_losses = []
        
        # early_stopping needs the validation loss to check if it has decresed, 
        # and if it has, it will make a checkpoint of the current model
        early_stopping(valid_loss, model)
        
        if early_stopping.early_stop:
            print("Early stopping")
            break
        
    # load the last checkpoint with the best model
    model.load_state_dict(torch.load('checkpoint.pt'))

    return  model, avg_train_losses, avg_valid_losses

batch_size = 5
n_epochs = 500
patience = 100
model, train_loss, valid_loss = train_model(model, batch_size, patience, n_epochs)

Thank you very much.

ptrblck · July 10, 2023, 5:08pm

You could use either loss function but would need to change the model architecture since nn.BCEWithLogitsLoss expects a single output from the model while nn.CrossEntropyLoss expects two outputs for a binary classification use case.
The model is properly overfitting to the training dataset.

Geerthy1130 · July 17, 2023, 3:24pm

Can I use logsoftmax at the output layer and nn.Crossentropy loss as a loss function for binary classification. Here’s the model:

class ConvNet1D(nn.Module):
    def __init__(self):
        super(ConvNet1D,self).__init__()
        self.ECG1 = nn.Sequential(
                  nn.CNN2d(1, 16, (12,12)),  # change this
              )
        self.tanh1 = nn.Tanh() 
        self.dropout = nn.Dropout(p=0.4)
        self.flat = nn.Flatten()
        #initialize our softmax classifier
        self.fc2 = nn.Linear(in_features=16*1*290, out_features=2) #classes =2
        self.logSoftmax = nn.LogSoftmax(dim=1)
    def forward(self, x):
       x = self.tanh1(self.ECG1(x))
       x = self.dropout(x)
       x = self.flat(x)
       x = self.fc2(x)
       output = self.logSoftmax(x)
  		# return the output predictions
       return output

# Loss and optimizer
learning_rate=0.001
criterion =nn.CrossEntropyLoss()      #nn.BCEWithLogitsLoss() # F.binary_cross_entropy_with_logits()
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate, betas=[0.9,0.999], amsgrad=False)

Thank you very much

ptrblck · July 17, 2023, 3:49pm

Yes, this would be possible but wasteful, since nn.CrossEntropyLoss is already applying F.log_softmax internally. You could thus use nn.NLLLoss instead which expects log probabilities as the model output.

Geerthy1130 · July 17, 2023, 4:13pm

So, can I remove the logsoftmax layer and pass the fully connected layer (which output two classes)to the cross entropy loss. Here’s below the code:

class ConvNet1D(nn.Module):
    def __init__(self):
        super(ConvNet1D,self).__init__()
        self.ECG1 = nn.Sequential(
                  nn.CNN2d(1, 16, (12,12)),  # change this
              )
        self.tanh1 = nn.Tanh() 
        self.dropout = nn.Dropout(p=0.4)
        self.flat = nn.Flatten()
        #initialize our softmax classifier
        self.fc2 = nn.Linear(in_features=16*1*290, out_features=2) #classes =2
    def forward(self, x):
       x = self.tanh1(self.ECG1(x))
       x = self.dropout(x)
       x = self.flat(x)
       output= self.fc2(x)
  # return the output predictions
       return output
# Loss and optimizer
learning_rate=0.001
criterion =nn.CrossEntropyLoss()      #nn.BCEWithLogitsLoss() # F.binary_cross_entropy_with_logits()
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate, betas=[0.9,0.999], amsgrad=False)

ptrblck · July 17, 2023, 4:15pm

Yes, you can directly pass the output of the last linear layer to nn.CrossEntropyLoss as it will represent raw logits.

Geerthy1130 · July 20, 2023, 3:33pm

Thank you. I use early stopping and my input shape is(7000125000) 7000 samples with batch size 32. I use simple one layer 2d CNN network with FC layer. I get best model in the the first epoch. What does it mean? Is the model learning?
Suggestions, please.