Training resnet on increasing fraction of a dataset

Hello there!

For a ongoing research project, we are utilizing a NF-ResNet 26 model and training it to classify two outcomes (yes/no). We’d like to follow this approach: Take a fraction of the dataset (1%-10%), and train each fraction for a number of 50 epochs. Afterwards the model is tested on the whole validation dataset, the fraction increases and so on. I’ve never tried an approach like this, so I want to make sure that the code I set up is properly working. It looks like this

results = {
    'fraction': [],
    'epoch': [],
    'train_loss': [],
    'train_accuracy':[],
    'val_loss': [],
    'val_accuracy': []
}

num_epochs = 50
total_samples = len(train_dataset)

    # Create a subset of the dataset
for fraction in range(1, 11):  # 1% to 10%
    # Calculate the number of samples for the current fraction
    num_samples = int(total_samples * (fraction / 100))
    subset_indices = list(range(num_samples))

    subset= Subset(train_dataset, subset_indices)
 

    # Create a DataLoader for the current subset
    train_loader = DataLoader(subset, batch_size=32, shuffle=True)

    print(f"Training with {fraction}% of the dataset ({num_samples} samples)...")

    for epoch in range(num_epochs):

        model.train()  # Set model to training mode

        running_loss = 0.0
        train_correct = 0
        train_total = 0
        for inputs, labels in train_loader:

            inputs, labels = inputs.to(device), labels.to(device)

            # Zero the parameter gradients
            optimizer.zero_grad()

            # Forward pass
            outputs = model(inputs)
            loss = criterion(outputs, labels)

            _, predicted = torch.max(outputs.data, 1)
            train_total += labels.size(0)
            train_correct += (predicted == labels).sum().item()

            # Backward pass and optimization
            loss.backward()
            optimizer.step()

            running_loss += loss.item()

        # Print the average loss for this epoch
        print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {running_loss/len(train_loader):.4f}')
        # Print the average accuracy for this epoch
        print(f'Epoch[{epoch+1}/{num_epochs}],Train_Accuracy: {100 * train_correct / train_total:.2f}%')

            # Optionally, evaluate on the validation set
        model.eval()  # Set model to evaluation mode
        val_loss = 0.0
        correct = 0
        total = 0
        with torch.no_grad():
            for inputs, labels in test_loader:
              inputs, labels = inputs.to(device), labels.to(device)
              outputs = model(inputs)
              loss = criterion(outputs, labels)
              val_loss += loss.item()

              _, predicted = torch.max(outputs.data, 1)
              total += labels.size(0)
              correct += (predicted == labels).sum().item()

        print(f'Validation Loss: {val_loss/len(test_loader):.4f}, Accuracy: {100 * correct / total:.2f}%')

        results['fraction'].append(fraction)
        results['epoch'].append(epoch + 1)
        results['train_loss'].append(running_loss/len(train_loader))
        results['train_accuracy'].append(100 * train_correct / train_total)
        results['val_loss'].append(val_loss/len(test_loader))
        results['val_accuracy'].append(100 * correct / total)

In particular, Im not sure about the following: Do I understand correctly, that the model is trained on a fraction (lets say 1%) and if the fraction is increased (i.e 2% of the dataset sampled), the model is not “newly trained” but utilizes the weights learned with the 1% fraction dataset? And so on?

I would appreciate any feedback a lot, I just want to be crystal clear the I am following the correct path here.

Thank you in advance

Best

Yes, this is the case as long as you don’t recreate the model or re-initialize its parameters explicitly.

1 Like