99%+ validation accuracy, but doesn't generalize well (CNN)

Hi there,

I’ve been working on a multi-class classifier and have hit a wall with getting it to generalize well. It uses a product name (e.g. “Amazon Basics Pencil”), specification (e.g. “Length”: “5 Inches”), and category (e.g. “Pencils & Pens”) to predict to one of 1000+ classes.

These classes are also product categories, but are usually more specific than the example (e.g. “Pencils”). They’re also imbalanced, so I used a weighted sampler to ensure the validation set is diverse.

I have 200K+ labelled examples, with a minimum of 25 per class. Results over 10 epochs are here:

Epoch 01: | Train Loss: 4.38 | Val Loss: 1.26 | Train Acc: 43.46 | Val Acc: 97.20
# Other epochs
Epoch 10: | Train Loss: 3.47 | Val Loss: 0.41 | Train Acc: 49.90 | Val Acc: 99.69

The model starts off with very high validation accuracy and only gets higher from there. Since I use 0.5 dropout after each of the 2 fully connected layers, I’m hoping the train acc sticking under 50% is related.

My issue is that that model does not generalize well to new examples outside of the training set. Predicting validation set examples returns the right class with high confidence (~0.5+). However, new examples with slight variations in name or specification returns either the wrong prediction or the right one with low confidence (~0.005)+.

Thank you for getting this far and would really appreciate any advice. Happy to provide other details if it helps out.

Here’s the main section from the forward function I’m working with:

# Separate embeddings created for product name, specifications, and category.
# Create ngram (1..5) filters
# Max pool & flatten

# These convolutional layer outputs are concatenated
x = torch.cat((n1, n2, n3, n4, n5, a1, a2, a3, a4, a5, c1, c2, c3, c4, c5), 2)
x = x.reshape(x.size(0), -1)

# RELU, Batch Norm & Dropout 1
x = F.relu(x)
x = self.bn1(x)
x = self.dropout_conv(x)

# Fully connected layer 1 
x = self.fc1(x)

# RELU, Batch Norm & Dropout 2
x = F.relu(x)
x = self.bn2(x)
x = self.dropout_fc(x)

# Fully connected layer 2 
x = self.fc2(x)

# RELU, Batch Norm & Dropout 3
x = F.relu(x)
x = self.bn3(x)
x = self.dropout_fc(x)
# Softmax output 
# x = self.sm(x)
return x

And finally, the main section from the train function:

# Initialize loaders
loader_train = DataLoader(train, batch_size=32, sampler=weighted_sampler, pin_memory=True, num_workers=4)
loader_test = DataLoader(test, batch_size=32, pin_memory=True, num_workers=4)
# Loss + optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9)
# Scheduler for dynamic learing late (Between 0.01 and 0.001)
scheduler = optim.lr_scheduler.OneCycleLR(optimizer, max_lr=0.001,
steps_per_epoch=len(loader_train), epochs=10)

# Starts training phase
for epoch in range(params.epochs):

    # Set model in training model

    # Starts batch training
    for x1, x2, x3, y_true in tqdm(loader_train, total=len(loader_train) , leave =  False):
        y_true = y_true.type(torch.LongTensor)
        # Zero the parameter gradients
        # Forward + backward + optimize        
        y_pred = model(x1, x2, x3)
        loss = criterion(y_pred, y_true) 

        # Scheduler step
1 Like

Getting >99% accuracy on the validation set usually means the validation set was accidentally part of the train set.

An easy way to check this is is to make you test and validation sets only size 10 and literally print the data that’s being passed through.