Training issues with basic 1D-CNN

Hi everyone,

Here’s my problem. I have a basic 1D-CNN trained for regression purposes (my training data are spectra of length 543).
I do not have errors when training (input shape seems in the right format [N_obs, Channel, Width]), but there is definitely a problem with my model since it cannot overfit even on few data (5 in this example - 4 in training 1 in test) and my training loss curve looks weird (it decreases very quicky then reach a plateau and stays there ad vitam aeternam).

I think I need an external point of view to see if there is someting wrong in my code. Thank you for you help.

Here’s my complete code.

My main function:

    num_epochs = 300
    device = "cuda" if torch.cuda.is_available() else "cpu"
    print(f"Using {device} device")

    X_train, X_test, y_train, y_test = train_test_split(
        X_train, y_train, train_size=0.8)
    
    print(X_train.shape, X_test.shape, y_train.shape, y_test.shape)
    # output: (4, 543) (1, 543) (4,) (1,)
    
    train_data = TabularDataset(X_train, y_train)
    test_data = TabularDataset(X_test, y_test)

    model = ConvNet_1D()
    optimizer = Adam(model.parameters(), lr=0.001)

    train_dataloader = DataLoader(train_data, batch_size=128, shuffle=True)
    test_dataloader = DataLoader(test_data, batch_size=128, shuffle=True)

    for X, y in test_dataloader:
        print("Shape of X: ", X.shape, X.dtype)
        # Output: Shape of X:  torch.Size([1, 1, 543]) torch.float64
        print("Shape of y: ", y.shape, y.dtype)
        break

    # # Model training
    losses_x: list = []
    losses_y: list = []
    for epoch in range(0, num_epochs):
        print(f"Epoch {epoch + 1}\n----------------------------")
        loss_x = training_function(train_dataloader, model, optimizer, device)
        loss_y = testing_function(test_dataloader, model, device)
        losses_x.append(loss_x)
        losses_y.append(loss_y)

My CNN architecture:

from torch import nn

class ConvNet_1D(nn.Module):
    '''
    Defines a 1D-CNN architecture with batch normalization and max pooling layers.
    '''
    def __init__(self, dropout=0):
        super(ConvNet_1D, self).__init__()

        self.ConvNet = nn.Sequential(
            nn.Conv1d(in_channels=1, out_channels=32, kernel_size=3),
            nn.ReLU(),
            nn.BatchNorm1d(32),
            nn.MaxPool1d(kernel_size=2),

            nn.Conv1d(in_channels=32, out_channels=64, kernel_size=3),
            nn.ReLU(),
            nn.BatchNorm1d(64),
            nn.MaxPool1d(kernel_size=2),

            nn.Conv1d(in_channels=64, out_channels=128, kernel_size=3),
            nn.ReLU(),
            nn.BatchNorm1d(128),
            nn.MaxPool1d(kernel_size=2),

            nn.Conv1d(in_channels=128, out_channels=256, kernel_size=3),
            nn.ReLU(),
            nn.BatchNorm1d(256),
            nn.MaxPool1d(kernel_size=2),

            nn.Dropout(dropout),
            nn.Flatten(),
        )

        self.task = nn.Sequential(
            nn.Linear(8192, 100),
            nn.Dropout(dropout),
            nn.ReLU(),
            nn.Linear(100, 1),
        )

    def forward(self, x):
        x = self.ConvNet(x)
        output = self.task(x)

        return output

My dataloader:

import numpy as np
import torch
from torch.utils.data import Dataset


class TabularDataset(Dataset):
    def __init__(self, train, labels, transform=None):

        self.train = train
        self.labels = labels
        self.transform = transform

    def __len__(self):
        return len(self.labels)
    
    def get_data(self):
        return self.train

    def __getitem__(self, idx):
        x = torch.tensor(self.train[idx]).unsqueeze(0)
        y = torch.tensor(self.labels[idx])

    # Apply various transformations to your data here
        if self.transform:
            x, y = self.transform(x), self.transform(y)

        return x, y

My training function:

import torch

from xxx.utils import RMSELoss


def training_function(dataloader, model, optimizer, device, scheduler=None):
    size = len(dataloader.dataset)
    for batch, (X, y) in enumerate(dataloader):
        X, y = X.to(device), y.to(device)

        # Compute the prediction error
        pred = model(X.float())
        loss = RMSELoss(pred, y)
        
        # Reset the gradient to 0 after each epoch
        optimizer.zero_grad()

        # Compute the backpropagation (back propagate the loss and add one step to the optimizer)
        loss.backward()
        optimizer.step()

        if batch % 100 == 0:
            loss = loss.item()
            print(f"training loss: {loss:>7f}")
        
        return loss

def testing_function(dataloader, model, device):
    size = len(dataloader.dataset)
    num_batches = len(dataloader)
    model.eval()
    test_loss, correct = 0, 0

    # set the gradient to 0 (since we do not need to calculate the gradient on the test set)
    with torch.no_grad():
        for X, y in dataloader:
            X, y = X.to(device), y.to(device)
            pred = model(X.float())
            test_loss += RMSELoss(pred, y).item()
            correct += (pred == y).type(torch.float).sum().item()
    test_loss /= num_batches
    correct /= size
    print(f"Validation loss: {test_loss:>8f} \n")
    return test_loss

The standard answer to “the loss doesn’t decrease enough”/“cannot overfit a small amount of data” is “increase capacity”, i.e. use more channels etc. What happens if you play with the number of channels?

Best regards

Thomas

Hi Thomas!

I tried to increase the model complexity (channels x2), but it didn’t improve the training loss. It’s even the opposite. With only 10 data (8 in training and 2 in test) I get a pretty big RMSE on the training set while I get almost 0 with a simple Gradient Boosting model (not optimized).

Here’s my training and test losses:

Three things I would look at (from “duh”-moments I have had over the years) are

  • are dimensions all used right, e.g. not using the batch dimension as features or somesuch (not as easy with CNNs as with RNNs, but worth double checking)?
  • do gradients vanish? (you have batch norm, but still…)
  • is the range the model produce actually match the range of your targets (should be OK with ReLU + Linear at the end, mostly a problem with Tanh and friends).

Best regards

Thomas

The data is not being normalized. Is it intended?

Thanks Arul for your answer.
I do have a normalization stage but it’s not in my TabularDataset class (but I could move it to this class indeed).

If possible, try the following things and observe the behavior:

  1. Play around with the learning rate (Specifically, try to reduce the learning rate and see if it helps)
  2. Try adding weight_decay parameter
  3. Replace Adam with SGD and see what happens

My model receive in input a shape [128, 1, 543] ([batch_size, N_channel, N_features]), so it should be ok for this part.

For the gradient vanishing, it’s indeed on my list of things to check (but I’m still not sure how to verify that).

My model should predict values from 0 to a certain value (it depends on the target but not more that 20 000), so Linear at the end should be ok.

Thanks Arul.

  1. If I decrease or increase the learning rate, the performance is a bit different but it is still underfitting.
  2. With 0.05 weight_decay I have similar results.
  3. With SGD I have a flat training and test loss (if I increase the learning rate, I think I get exploding gradient with nan in losses).

When it comes to underfitting, @tom’s suggestion to increase the model capacity is in the right direction.

2 x 10^4 is quite a large value for a target. I would try to normalize the target as well to be in a smaller range. (preferably, [-1, 1])

My model starts to have a normal behaviour in terms of train/validation loss with normalized labels. Thanks.