Selected classes from CIFAR100 RuntimeError: Assertion `cur_target >= 0 && cur_target < n_classes' failed. at ../aten/src/THNN/generic/ClassNLLCriterion.c:92

Hello everyone!
I’d appreciate if anyone could help me with the following problem…
I’m trying to train the model using limited classes from CIFAR100 dataset, by creating a new one:

def limited_set(data, target_list):
    features_lim = []
    labels_lim = []

    for i in range(len(data.targets)):

        if training_dataset.targets[i] in target_list:
            features_lim.append(data.data[i])
            labels_lim.append(data.targets[i])

    return np.asarray(np.transpose(features_lim, (0, 3, 1, 2))), np.asarray(labels_lim)

then passing the result of the function to the following class:

class MyDataset(Dataset):

    def __init__(self, data, target, transform=None):
        self.data = torch.from_numpy(data).float()
        self.target = torch.from_numpy(target).long()
        self.transform = transform

    def __getitem__(self, index):
        x = self.data[index]
        y = self.target[index]

        if self.transform:
            x = self.transform(x)

        return x, y

    def __len__(self):
        return len(self.data)
train_dataset = MyDataset(features_new, labels_new, transform=transform_train)
train_dataloader = DataLoader(train_dataset, batch_size = 100, shuffle=True)

test_dataset = MyDataset(features_test, labels_test, transform=transform)
test_dataloader = DataLoader(test_dataset, batch_size = 100, shuffle=False)

Here is test NN architecture (e.g. for 10 classes):

class Net(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(3, 16, 3, 1, padding=1)
        self.conv2 = nn.Conv2d(16, 32, 3, 1, padding=1)
        self.fc1 = nn.Linear(4 * 4 * 128, 300)
        self.dropout1 = nn.Dropout(0.5)
        self.fc2 = nn.Linear(300, 10)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        x = F.max_pool2d(x, 2, 2)
        x = F.relu(self.conv2(x))
        x = F.max_pool2d(x, 2, 2)
        x = x.view(-1, 4 * 4 * 128)
        x = F.relu(self.fc1(x))
        x = self.dropout1(x)
        x = self.fc2(x)
        return x

Train/evaluate function looks like this:

def train_nn(n_epochs):

    running_loss_history = []
    running_corrects_history = []
    val_running_loss_history = []
    val_running_corrects_history = []

    for e in range(epochs):

        running_loss = 0.0
        running_corrects = 0.0
        val_running_loss = 0.0
        val_running_corrects = 0.0

        for inputs, labels in train_dataloader:
            inputs = inputs.to(device)
            labels = labels.to(device)
            outputs = model(inputs)
            loss = criterion(outputs, labels)

            optimizer.zero_grad()
            loss.backward()
            optimizer.step()

            _, preds = torch.max(outputs, 1)
            running_loss += loss.item()
            running_corrects += torch.sum(preds == labels.data)

        else:
            with torch.no_grad():
                for val_inputs, val_labels in test_dataloader:
                    val_inputs = val_inputs.to(device)
                    val_labels = val_labels.to(device)
                    val_outputs = model(val_inputs)
                    val_loss = criterion(val_outputs, val_labels)

                    _, val_preds = torch.max(val_outputs, 1)
                    val_running_loss += val_loss.item()
                    val_running_corrects += torch.sum(val_preds == val_labels.data)

            epoch_loss = running_loss / len(train_dataloader)
            epoch_acc = running_corrects.float() / len(train_dataloader)
            running_loss_history.append(epoch_loss)
            running_corrects_history.append(epoch_acc)

            val_epoch_loss = val_running_loss / len(test_dataloader)
            val_epoch_acc = val_running_corrects.float() / len(test_dataloader)
            val_running_loss_history.append(val_epoch_loss)
            val_running_corrects_history.append(val_epoch_acc)
            print('epoch :', (e + 1))
            print('training loss: {:.4f}, acc {:.4f} '.format(epoch_loss, epoch_acc.item()))
            print('validation loss: {:.4f}, validation acc {:.4f} '.format(val_epoch_loss, val_epoch_acc.item())

    return running_corrects_history, val_running_corrects_history

However, if I try to train the model, I got the following error: RuntimeError: Assertion `cur_target >= 0 && cur_target < n_classes’ failed. at …/aten/src/THNN/generic/ClassNLLCriterion.c:92

Unless I change the number of output classes to initial 100. The same thing is happening when I’m trying to train the model on CIFAR10 using fewer classes…

Does anyone know the reason/solution?
Many thanks in advance

I’m not sure how you are using limited_set. If you could like to only store certain classes, you should add some kind of condition to select the corresponding data samples and labels.

That being said, try to add an assert statement in the training loop and check for the invalid values, i.e. (labels < 0).any() and (labels >= nb_limited_classes).any(). This should raise the exception when the invalid batch is being processed.
Then check the values of labels and make sure they only contain class indices in the required range.

1 Like

It seems, that my limited_set function did wrong…
Thanks a ton!