RuntimeError: multi-target not supported custom dataset in ignite

Tanya_Boone · May 28, 2020, 3:23am

Hello! I am trying to train a neural network with a preprocessed CIFAR10 : I managed to pass them from a .npy extension to a .pt extension and load them on the code

train_images=torch.from_numpy(torch.load('train_images.pt')).type(torch.LongTensor)
test_images=torch.from_numpy(torch.load('test_images.pt')).type(torch.LongTensor)
train_labels=torch.from_numpy(torch.load('train_labels.pt')).type(torch.LongTensor)
test_labels=torch.from_numpy(torch.load('test_labels.pt')).type(torch.LongTensor)


train_transform = Compose([
    ToPILImage(mode='RGB'),
    Resize(image_size, BICUBIC),
    RandomAffine(degrees=2, translate=(0.02, 0.02), scale=(0.98, 1.02), shear=2, fillcolor=(124,117,104)),
    RandomHorizontalFlip(),
    Pad(4),
    ToTensor(),
    Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])

test_transform = Compose([
    ToPILImage(mode='RGB'),
    Resize(image_size, BICUBIC),
    Pad(4),    
    ToTensor(),
    Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])

class CIFAR100Dataset(Dataset):
    
    def __init__(self, images, labels=None, transforms=None):
        self.X = images
        self.y = labels
        self.transforms=train_transform
    
    def __len__(self):
        return(len(self.X))
        
    def __getitem__(self, i):
        data = self.X[i]
        data = np.asarray(data).astype(np.uint8).reshape(32, 32, 3)
        if self.transforms:
            data = self.transforms(data)
        if self.y is not None:
            return (data, self.y[i])
        else:
            return data

train_dataset = CIFAR100Dataset(train_images, train_labels, train_transform)
test_dataset = CIFAR100Dataset(test_images, test_labels, test_transform)

The problem is…when I try to train I get this message:

RuntimeError: multi-target not supported at /opt/conda/conda-bld/pytorch_1587428270644/work/aten/src/THCUNN/generic/ClassNLLCriterion.cu:18

So, I take a look into my target data and the size is [100,100], so I change it to [100] with

y=(torch.max(y, 1)[1]).type(torch.LongTensor).cuda()

in this part of the code

def update_fn(engine, batch):
    model.train()
    
    x = convert_tensor(batch[0], device=device, non_blocking=True)
    y = convert_tensor(batch[1], device=device, non_blocking=True)
    y=(torch.max(y, 1)[1]).type(torch.LongTensor).cuda()
    y_pred = model(x)
    # Compute loss 
    loss = criterion(y_pred,y)  

    optimizer.zero_grad()
    if use_amp:
        with amp.scale_loss(loss, optimizer, loss_id=0) as scaled_loss:
            scaled_loss.backward()
    else:
        loss.backward()
    optimizer.step()
    
    return {
        "batchloss": loss.item(),

If I print y (my target)

tensor([27, 40, 90, 26, 50, 41, 27, 93, 84, 82, 76, 85, 93, 57, 68, 89, 25, 18,
         9, 18, 40,  7, 26, 84, 64, 73, 43, 74, 49, 18, 22, 26, 31,  7, 67, 42,
         3, 96, 53, 38, 47, 99, 26, 55, 64, 22, 29, 81, 90, 19, 16, 79, 93, 17,
        95, 42, 34, 25, 29, 47,  2, 43, 32, 94, 13,  9, 14, 45, 90, 92,  2,  9,
        11, 62, 75, 54,  7, 45, 68,  5, 24, 66, 36, 72, 43, 68,  4, 56, 57, 31,
        52, 27, 49, 19,  2, 88, 33, 11, 48, 59], device='cuda:0')

The shape is now fine

torch.Size([100])

Note: I used this same change in another code that is not ignite and it works:

    model.eval()
    for data, target in test_loader:
        # move tensors to GPU if CUDA is available
        
       # if contador ==35000
        if train_on_gpu:
            data, target = data.cuda(), target.cuda()
          
        # forward pass: compute predicted outputs by passing inputs to the model

        output = model(data)
        values,indices=torch.max(target,1)
        target=indices

So, my question here is…Why is not working on ignite? I am missing something on the update_fn function?

Regards,

ptrblck · May 28, 2020, 6:03am

The shape and the values of y look correct, so Ignite shouldn’t raise an issue.
Are you sure you’ve rerun the complete code (or cells)?
If so, could you directly print the shapes of y_pred and y before passing them to the criterion?

Tanya_Boone · May 28, 2020, 6:15am

Yes, this is what I am doing

# x.shape
torch.Size([100, 3, 40, 40])

#y values
tensor([ 1, 30, 63, 70, 37, 29, 40, 40, 16, 15,  3, 31, 34, 42, 96, 71, 53, 25,
        61, 64,  5, 34, 68, 23, 21, 32, 98, 86, 41, 15, 82, 77, 44, 79, 42, 28,
         9, 86, 17, 42, 98, 25, 66, 28, 15, 56, 45, 13, 86, 70, 37, 94, 68, 65,
         2, 90, 24, 11, 14, 18, 29, 46, 44,  7, 12, 31, 51, 80, 56, 35, 79, 92,
        89, 25, 17, 80, 90, 83, 19, 44, 17, 57, 98, 29, 20, 21, 29, 46, 82, 51,
        35, 51,  2, 27, 70, 17,  6, 35, 23, 48], device='cuda:0')
#y shape
torch.Size([100])

#y_pred
torch.Size([100, 100])

I am comparing with the same code but regular CIFAR100 dataset (not custom) and I get the same sizes
(well in this case the batch size is 800)

torch.Size([800, 3, 40, 40])
torch.Size([800])
torch.Size([800, 100])

I am sure I am running the latest file, but I don’t know is not working on ignite…

ptrblck · May 28, 2020, 6:17am

That should be correct. Let’s see, if @vfdev-5 knows, what might be going on.
Could you post a reproducible code snippet including the Ignite code?

vfdev-5 · May 28, 2020, 7:37am

@Tanya_Boone for me the problem is not related to ignite but with datatypes, criterion used etc…

Please, post a minimal code, maybe with random data to perform a single training step.

PS: @ptrblck thanks for helping on the issue