[Newbie] Multi-target RunTime error when classifying dataset with multi class target

I am following the example code for Transfer Learning on Pytorch documentation. My dataset though is not images and instead is numpy array with X size is 15 (read from 15 pixels) and y is uniques classes with 5 values (1,2,3,4,5). I explain my approach step by step:

import numpy as np
import time 
import copy
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset, TensorDataset, DataLoader, random_split
import torch.nn.functional as F

Here is how my X[0] and y[0] look like:
(array([739, 742, 734, 732, 746, 734, 736, 737, 728, 742, 741, 736, 736, 741, 316]), array([2]))

I create my dataset as:

class MyDataset(Dataset):
    def __init__(self, data, target, transform=None):
        self.data = torch.from_numpy(data).float()
        self.target = torch.from_numpy(target).long()
        self.transform = transform
        
    def __getitem__(self, index):
        x = self.data[index]
        y = self.target[index]
        
        if self.transform:
            x = self.transform(x)
        
        return x, y
    
    def __len__(self):
        return len(self.data)

I split the X, y for training and validation as:

full_ds = MyDataset(X, y)

trn_size = int(0.8*len(full_ds))
val_size = len(full_ds) - trn_size
trn_ds, val_ds = random_split(full_ds, [trn_size, val_size])

datasets = {'train': trn_ds, 'val': val_ds}
dataloaders = {x: DataLoader(datasets[x], batch_size=4, shuffle=True) for x in ['train', 'val']}
dataset_sizes = {x: len(datasets[x]) for x in ['train', 'val']}

My model, optimizer and loss are:

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.fc1 = nn.Linear(15, 8)
        self.fc2 = nn.Linear(8, 5)

    def forward(self, x):
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        return F.log_softmax(x, dim=1)

net = Net()
optimizer = optim.SGD(net.parameters(), lr=0.01, momentum=0.9)
criterion = nn.NLLLoss()

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

and as i mentioned earlier my training function follows Pytorch standard example:

def train_model(model, criterion, optimizer, num_epochs=10):
    since = time.time()

    best_model_wts = copy.deepcopy(model.state_dict())
    best_acc = 0.0

    for epoch in range(num_epochs):
        print('Epoch {}/{}'.format(epoch, num_epochs - 1))
        print('-' * 10)

        # Each epoch has a training and validation phase
        for phase in ['train', 'val']:
            if phase == 'train':
                
                model.train()  # Set model to training mode
            else:
                model.eval()   # Set model to evaluate mode

            running_loss = 0.0
            running_corrects = 0

            # Iterate over data.
            for inputs, labels in dataloaders[phase]:
                inputs = inputs.to(device)
                labels = labels.to(device)

                # zero the parameter gradients
                optimizer.zero_grad()

                # forward
                # track history if only in train
                with torch.set_grad_enabled(phase == 'train'):
                    outputs = model(inputs)
                    _, preds = torch.max(outputs, 1)
                    loss = criterion(outputs, labels)

                    # backward + optimize only if in training phase
                    if phase == 'train':
                        loss.backward()
                        optimizer.step()

                # statistics
                running_loss += loss.item() * inputs.size(0)
                running_corrects += torch.sum(preds == labels.data)

            epoch_loss = running_loss / dataset_sizes[phase]
            epoch_acc = running_corrects.double() / dataset_sizes[phase]

            print('{} Loss: {:.4f} Acc: {:.4f}'.format(
                phase, epoch_loss, epoch_acc))

            # deep copy the model
            if phase == 'val' and epoch_acc > best_acc:
                best_acc = epoch_acc
                best_model_wts = copy.deepcopy(model.state_dict())

        print()

    time_elapsed = time.time() - since
    print('Training complete in {:.0f}m {:.0f}s'.format(
        time_elapsed // 60, time_elapsed % 60))
    print('Best val Acc: {:4f}'.format(best_acc))

    # load best model weights
    model.load_state_dict(best_model_wts)
    return model

When running training function, I am getting the following runtime error:

RuntimeError: multi-target not supported at /Users/administrator/nightlies/pytorch-1.0.0/wheel_build_dirs/conda_3.7/conda/conda-bld/pytorch_1544144746443/work/aten/src/THNN/generic/ClassNLLCriterion.c:21

I am stuck here and I would appreciate your help!

Your problem is likely the dimension of the labels tensor.

In your class MyDataset your y is an array([2]). If you load now these target labels using the dataloader this will output a tensor with an additional dimension which NLLLoss will complain about.

Calling labels.squeeze() before the loss (i.e. loss = criterion(outputs, labels.squeeze())) should fix the error.

Also have a look at this issue which describes a similar problem: #3670

thanks for the suggestions. After I try loss = criterion(outputs, labels.squeeze()) , I now get new runtime error as:

    RuntimeError: Assertion `cur_target >= 0 && cur_target < n_classes’ failed. at /Users/administrator/nightlies/pytorch-1.0.0/wheel_build_dirs/conda_3.7/conda/conda-bld/pytorch_1544144746443/work/aten/src/THNN/generic/ClassNLLCriterion.c:93

Is it still related to dimensions?

Pytorch accepts class numbers starting from 0 to (number of classes - 1). in your case (0 to 4).

Can you let me know how I should fix my code to reflect your suggestion? I tried replacing the output dimension in my model to self.fc2 = nn.Linear(8, 4) but I get the same error

oh ok. You should subtract 1 from y in your dataset code.

return x, y-1

after changing

y to y-1

in dataset class, with

loss = criterion(outputs, labels.squeeze())

the error is:

RuntimeError: Assertion `cur_target >= 0 && cur_target < n_classes’ failed. at /Users/administrator/nightlies/pytorch-1.0.0/wheel_build_dirs/conda_3.7/conda/conda-bld/pytorch_1544144746443/work/aten/src/THNN/generic/ClassNLLCriterion.c:93

and with

loss = criterion(outputs, labels)

, the error is:

RuntimeError: multi-target not supported at /Users/administrator/nightlies/pytorch-1.0.0/wheel_build_dirs/conda_3.7/conda/conda-bld/pytorch_1544144746443/work/aten/src/THNN/generic/ClassNLLCriterion.c:21

Can you print and make sure that the labels are between 0 and 4 in your training loop?

after changing y to y-1 in dataset, I tried:

for epoch in range(1):
    net.train()
    for inputs, labels in dataloaders['train']:
        optimizer.zero_grad()
        with torch.set_grad_enabled(True):
            labels.squeeze()
            print (labels)

the output was:

tensor([[1], [1], [3], [2]])

if I add

loss = criterion(outputs, labels)

I get the same error as above

When you squeeze the labels, assign it to the same variable.

labels = labels.squeeze()

after assigning the squeeze labels to labels, printing labels gives:

tensor([1, 1, 1, 3])

However, the error from the loss is still the same:

RuntimeError: Assertion `cur_target >= 0 && cur_target < n_classes’ failed. at /Users/administrator/nightlies/pytorch-1.0.0/wheel_build_dirs/conda_3.7/conda/conda-bld/pytorch_1544144746443/work/aten/src/THNN/generic/ClassNLLCriterion.c:93

@InnovArul any other suggestions?

pl verify if your inputs to the loss function is of appropriate sizes.
Just a simple working example:

import torch, torch.nn as nn

logits = torch.randn(10, 5)     # [10 x 5]
target = torch.randint(0, 4, [10])    # [10]   
loss = nn.CrossEntropyLoss()                            
loss_value = loss(logits, target)  # input: ([10 x 5], [10] ). All class labels are from 0 to 4.

RuntimeError: Assertion `cur_target >= 0 && cur_target < n_classes’ failed. at /Users/administrator/nightlies/pytorch-1.0.0/wheel_build_dirs/conda_3.7/conda/conda-bld/pytorch_1544144746443/work/aten/src/THNN/generic/ClassNLLCriterion.c:93

this error happens if the class labels are not within the range. (from 0 to number of classes)

size of inputs and labels from dataloader are:

torch.Size([4, 15]) torch.Size([4, 1])

“4” is coming from batch size…I tried inputs.sqeeze() and labels.squeeze() but the size does not change.

you should be possible to use label = label.squeeze() in my view.
can you share the code?

Please remember to use squeeze() at appropriate place. Without your current code, Its not possible to spot the exact place of error.

Thanks! For whatever reason squeeze() did not work for me; i had to use reshape instead. Additionally, I realized (contrary to my original statement) my y data ranges from 20 to 24. I had to scaled the y data to 0 to 4 and then everything worked just fine.