I’ve been working on a multi-class classifier and have hit a wall with getting it to generalize well. It uses a product name (e.g. “Amazon Basics Pencil”), specification (e.g. “Length”: “5 Inches”), and category (e.g. “Pencils & Pens”) to predict to one of 1000+ classes.
These classes are also product categories, but are usually more specific than the example (e.g. “Pencils”). They’re also imbalanced, so I used a weighted sampler to ensure the validation set is diverse.
I have 200K+ labelled examples, with a minimum of 25 per class. Results over 10 epochs are here:
Epoch 01: | Train Loss: 4.38 | Val Loss: 1.26 | Train Acc: 43.46 | Val Acc: 97.20 # Other epochs Epoch 10: | Train Loss: 3.47 | Val Loss: 0.41 | Train Acc: 49.90 | Val Acc: 99.69
The model starts off with very high validation accuracy and only gets higher from there. Since I use 0.5 dropout after each of the 2 fully connected layers, I’m hoping the train acc sticking under 50% is related.
My issue is that that model does not generalize well to new examples outside of the training set. Predicting validation set examples returns the right class with high confidence (~0.5+). However, new examples with slight variations in name or specification returns either the wrong prediction or the right one with low confidence (~0.005)+.
Thank you for getting this far and would really appreciate any advice. Happy to provide other details if it helps out.
Here’s the main section from the forward function I’m working with:
# Separate embeddings created for product name, specifications, and category. # Create ngram (1..5) filters # Max pool & flatten # These convolutional layer outputs are concatenated x = torch.cat((n1, n2, n3, n4, n5, a1, a2, a3, a4, a5, c1, c2, c3, c4, c5), 2) x = x.reshape(x.size(0), -1) # RELU, Batch Norm & Dropout 1 x = F.relu(x) x = self.bn1(x) x = self.dropout_conv(x) # Fully connected layer 1 x = self.fc1(x) # RELU, Batch Norm & Dropout 2 x = F.relu(x) x = self.bn2(x) x = self.dropout_fc(x) # Fully connected layer 2 x = self.fc2(x) # RELU, Batch Norm & Dropout 3 x = F.relu(x) x = self.bn3(x) x = self.dropout_fc(x) # Softmax output # x = self.sm(x) return x
And finally, the main section from the train function:
# Initialize loaders loader_train = DataLoader(train, batch_size=32, sampler=weighted_sampler, pin_memory=True, num_workers=4) loader_test = DataLoader(test, batch_size=32, pin_memory=True, num_workers=4) # Loss + optimizer criterion = nn.CrossEntropyLoss() optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9) # Scheduler for dynamic learing late (Between 0.01 and 0.001) scheduler = optim.lr_scheduler.OneCycleLR(optimizer, max_lr=0.001, steps_per_epoch=len(loader_train), epochs=10) # Starts training phase for epoch in range(params.epochs): # Set model in training model model.train() # Starts batch training for x1, x2, x3, y_true in tqdm(loader_train, total=len(loader_train) , leave = False): y_true = y_true.type(torch.LongTensor) # Zero the parameter gradients optimizer.zero_grad() # Forward + backward + optimize y_pred = model(x1, x2, x3) loss = criterion(y_pred, y_true) loss.backward() optimizer.step() # Scheduler step scheduler.step()