[Newbie] Multi-target RunTime error when classifying dataset with multi class target

alie · February 19, 2019, 7:18pm

I am following the example code for Transfer Learning on Pytorch documentation. My dataset though is not images and instead is numpy array with X size is 15 (read from 15 pixels) and y is uniques classes with 5 values (1,2,3,4,5). I explain my approach step by step:

import numpy as np
import time 
import copy
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset, TensorDataset, DataLoader, random_split
import torch.nn.functional as F

Here is how my X[0] and y[0] look like:
(array([739, 742, 734, 732, 746, 734, 736, 737, 728, 742, 741, 736, 736, 741, 316]), array([2]))

I create my dataset as:

class MyDataset(Dataset):
    def __init__(self, data, target, transform=None):
        self.data = torch.from_numpy(data).float()
        self.target = torch.from_numpy(target).long()
        self.transform = transform
        
    def __getitem__(self, index):
        x = self.data[index]
        y = self.target[index]
        
        if self.transform:
            x = self.transform(x)
        
        return x, y
    
    def __len__(self):
        return len(self.data)

I split the X, y for training and validation as:

full_ds = MyDataset(X, y)

trn_size = int(0.8*len(full_ds))
val_size = len(full_ds) - trn_size
trn_ds, val_ds = random_split(full_ds, [trn_size, val_size])

datasets = {'train': trn_ds, 'val': val_ds}
dataloaders = {x: DataLoader(datasets[x], batch_size=4, shuffle=True) for x in ['train', 'val']}
dataset_sizes = {x: len(datasets[x]) for x in ['train', 'val']}

My model, optimizer and loss are:

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.fc1 = nn.Linear(15, 8)
        self.fc2 = nn.Linear(8, 5)

    def forward(self, x):
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        return F.log_softmax(x, dim=1)

net = Net()
optimizer = optim.SGD(net.parameters(), lr=0.01, momentum=0.9)
criterion = nn.NLLLoss()

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

and as i mentioned earlier my training function follows Pytorch standard example:

def train_model(model, criterion, optimizer, num_epochs=10):
    since = time.time()

    best_model_wts = copy.deepcopy(model.state_dict())
    best_acc = 0.0

    for epoch in range(num_epochs):
        print('Epoch {}/{}'.format(epoch, num_epochs - 1))
        print('-' * 10)

        # Each epoch has a training and validation phase
        for phase in ['train', 'val']:
            if phase == 'train':
                
                model.train()  # Set model to training mode
            else:
                model.eval()   # Set model to evaluate mode

            running_loss = 0.0
            running_corrects = 0

            # Iterate over data.
            for inputs, labels in dataloaders[phase]:
                inputs = inputs.to(device)
                labels = labels.to(device)

                # zero the parameter gradients
                optimizer.zero_grad()

                # forward
                # track history if only in train
                with torch.set_grad_enabled(phase == 'train'):
                    outputs = model(inputs)
                    _, preds = torch.max(outputs, 1)
                    loss = criterion(outputs, labels)

                    # backward + optimize only if in training phase
                    if phase == 'train':
                        loss.backward()
                        optimizer.step()

                # statistics
                running_loss += loss.item() * inputs.size(0)
                running_corrects += torch.sum(preds == labels.data)

            epoch_loss = running_loss / dataset_sizes[phase]
            epoch_acc = running_corrects.double() / dataset_sizes[phase]

            print('{} Loss: {:.4f} Acc: {:.4f}'.format(
                phase, epoch_loss, epoch_acc))

            # deep copy the model
            if phase == 'val' and epoch_acc > best_acc:
                best_acc = epoch_acc
                best_model_wts = copy.deepcopy(model.state_dict())

        print()

    time_elapsed = time.time() - since
    print('Training complete in {:.0f}m {:.0f}s'.format(
        time_elapsed // 60, time_elapsed % 60))
    print('Best val Acc: {:4f}'.format(best_acc))

    # load best model weights
    model.load_state_dict(best_model_wts)
    return model

When running training function, I am getting the following runtime error:

RuntimeError: multi-target not supported at /Users/administrator/nightlies/pytorch-1.0.0/wheel_build_dirs/conda_3.7/conda/conda-bld/pytorch_1544144746443/work/aten/src/THNN/generic/ClassNLLCriterion.c:21

I am stuck here and I would appreciate your help!

gfrogat · February 19, 2019, 8:11pm

Your problem is likely the dimension of the labels tensor.

In your class MyDataset your y is an array([2]). If you load now these target labels using the dataloader this will output a tensor with an additional dimension which NLLLoss will complain about.

Calling labels.squeeze() before the loss (i.e. loss = criterion(outputs, labels.squeeze())) should fix the error.

Also have a look at this issue which describes a similar problem: #3670

alie · February 19, 2019, 9:08pm

thanks for the suggestions. After I try loss = criterion(outputs, labels.squeeze()) , I now get new runtime error as:

    RuntimeError: Assertion `cur_target >= 0 && cur_target < n_classes’ failed. at /Users/administrator/nightlies/pytorch-1.0.0/wheel_build_dirs/conda_3.7/conda/conda-bld/pytorch_1544144746443/work/aten/src/THNN/generic/ClassNLLCriterion.c:93

Is it still related to dimensions?

InnovArul · February 19, 2019, 9:48pm

Pytorch accepts class numbers starting from 0 to (number of classes - 1). in your case (0 to 4).

alie · February 19, 2019, 9:52pm

Can you let me know how I should fix my code to reflect your suggestion? I tried replacing the output dimension in my model to self.fc2 = nn.Linear(8, 4) but I get the same error

InnovArul · February 19, 2019, 9:59pm

oh ok. You should subtract 1 from y in your dataset code.

return x, y-1

alie · February 19, 2019, 10:25pm

after changing

y to y-1

in dataset class, with

loss = criterion(outputs, labels.squeeze())

the error is:

RuntimeError: Assertion `cur_target >= 0 && cur_target < n_classes’ failed. at /Users/administrator/nightlies/pytorch-1.0.0/wheel_build_dirs/conda_3.7/conda/conda-bld/pytorch_1544144746443/work/aten/src/THNN/generic/ClassNLLCriterion.c:93

and with

loss = criterion(outputs, labels)

, the error is:

RuntimeError: multi-target not supported at /Users/administrator/nightlies/pytorch-1.0.0/wheel_build_dirs/conda_3.7/conda/conda-bld/pytorch_1544144746443/work/aten/src/THNN/generic/ClassNLLCriterion.c:21

InnovArul · February 19, 2019, 11:38pm

Can you print and make sure that the labels are between 0 and 4 in your training loop?

alie · February 20, 2019, 12:06am

after changing y to y-1 in dataset, I tried:

for epoch in range(1):
    net.train()
    for inputs, labels in dataloaders['train']:
        optimizer.zero_grad()
        with torch.set_grad_enabled(True):
            labels.squeeze()
            print (labels)

the output was:

tensor([[1], [1], [3], [2]])

if I add

loss = criterion(outputs, labels)

I get the same error as above

InnovArul · February 20, 2019, 1:12am

When you squeeze the labels, assign it to the same variable.

labels = labels.squeeze()

alie · February 20, 2019, 2:52pm

after assigning the squeeze labels to labels, printing labels gives:

tensor([1, 1, 1, 3])

However, the error from the loss is still the same:

RuntimeError: Assertion `cur_target >= 0 && cur_target < n_classes’ failed. at /Users/administrator/nightlies/pytorch-1.0.0/wheel_build_dirs/conda_3.7/conda/conda-bld/pytorch_1544144746443/work/aten/src/THNN/generic/ClassNLLCriterion.c:93

alie · February 21, 2019, 2:03pm

@InnovArul any other suggestions?

InnovArul · February 21, 2019, 4:29pm

pl verify if your inputs to the loss function is of appropriate sizes.
Just a simple working example:

import torch, torch.nn as nn

logits = torch.randn(10, 5)     # [10 x 5]
target = torch.randint(0, 4, [10])    # [10]   
loss = nn.CrossEntropyLoss()                            
loss_value = loss(logits, target)  # input: ([10 x 5], [10] ). All class labels are from 0 to 4.

RuntimeError: Assertion `cur_target >= 0 && cur_target < n_classes’ failed. at /Users/administrator/nightlies/pytorch-1.0.0/wheel_build_dirs/conda_3.7/conda/conda-bld/pytorch_1544144746443/work/aten/src/THNN/generic/ClassNLLCriterion.c:93

this error happens if the class labels are not within the range. (from 0 to number of classes)

alie · February 21, 2019, 9:18pm

size of inputs and labels from dataloader are:

torch.Size([4, 15]) torch.Size([4, 1])

“4” is coming from batch size…I tried inputs.sqeeze() and labels.squeeze() but the size does not change.

InnovArul · February 21, 2019, 10:38pm

you should be possible to use label = label.squeeze() in my view.
can you share the code?

Please remember to use squeeze() at appropriate place. Without your current code, Its not possible to spot the exact place of error.

alie · February 25, 2019, 6:30pm

Thanks! For whatever reason squeeze() did not work for me; i had to use reshape instead. Additionally, I realized (contrary to my original statement) my y data ranges from 20 to 24. I had to scaled the y data to 0 to 4 and then everything worked just fine.