Taget and input must have the same number of elements

My input label size is torch.size([30, 2, 96, 96, 96])

My labels size is torch.size([30, 96, 96, 96]) ,Im feeding them to my loss function which goes like this

loss = F.binary_cross_entropy(F.sigmoid(output),labels,torch.FloatTensor(CLASS_WEIGHTS).cuda())

when i run this though i get

Value error:Target and input must have the same number of elements.target nelement(26542080) != input nelement(53084160)

Im a little confused here, i get the input value is twice the target value because it multiplies [30,96,96] by num classes,But im not sure why is that, and how to rectify it, Any suggestions will be helpful,Thanks in advance.

nn.BCELoss needs inputs and targets of the same shape, since each element individually could predict the occurrence of the class.
Probably you are looking for nn.NLLLoss or nn.CrossEntropyLoss.

I have the same issue. I tried with nn.CrossEntropyLoss and then BCELoss and got errors in both case. Just tried nn.NLLLoss and got this error:
“bool value of Tensor with more than one value is ambiguous”

I cannot understand what is the problem. My images are read in as PIL images in RGB mode and then transformed this way:

data_transforms = {
‘train’: transforms.Compose([
transforms.RandomResizedCrop(input_size),
transforms.RandomHorizontalFlip(),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
]),
‘val’: transforms.Compose([
transforms.Resize(input_size),
transforms.CenterCrop(input_size),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
]),
}

Masks are .PNG images and read with PIL library and the mode is L (black and white images). I applied these transformations:

class Train_Dataset(Dataset):

def __init__(self, imgarray, maskarray, transform=None):

    
    self.images = imgarray
    self.masks = maskarray
    self.transform = transform

    
def __getitem__(self, index):
    
    img = self.images[index]
    if self.transform is not None:
        img = self.transform(img)  
    
    #label = torch.from_numpy(self.masks[index])
    trnsf1 = transforms.RandomResizedCrop(input_size)
    trnsf2 = transforms.ToTensor()
    label = trnsf2(trnsf1(self.masks[index]))
    
   
    return img, label

def __len__(self):
    return len(self.images) 

##-----------------------------------------------------------------------------------

class val_Dataset(Dataset):

def __init__(self, imgarray_val, maskarray_val, transform=None):

    
    self.images = imgarray_val
    self.masks = maskarray_val
    self.transform = transform

    
def __getitem__(self, index):
   
    img = self.images[index]
    if self.transform is not None:
        img = self.transform(img)
        
    #label = torch.from_numpy(self.masks[index])
    trnsf1 = transforms.RandomResizedCrop(input_size)
    trnsf2 = transforms.ToTensor()
    label = trnsf2(trnsf1(self.masks[index]))
  
    return img, label

def __len__(self):
    return len(self.images)

So basically resizing the image and then converting it into a tensor.
I am not certain if converting a mask image into a tensor is reasonable as my mask label contains class indexes (each pixel gets discrete values between 0 and 3). When I transform this image into a tensor, the values will be between 0 and 1. I wonder how the computations will take place?
Also, My mask tensors get the shape [N, C, W, H]=[2, 1, 224,224] and the output of the model has the shape of [2, 4, 224, 224]. When using CrossEntropy, I get an error telling me that the dimension of my target should be of 3 and not 4. How should I change this? should I drop the C value and create a tensor of shape [N, W, H].

That’s a correct assumption and therefore I would transform the mask to a tensor via torch.from_numpy to keep the class indices. Some transformations, e.g. resizing or ToTensor might destroy your class indices.

Yes, that’s also correct. nn.CrossEntropyLoss as well as nn.NLLLoss expect a target without the channel dimension, which contains just the class indices.
In you case you could just call target = target.squeeze(1).

1 Like

Thank you for the answer.
I have another question now, if I am resizing the image so it is suitable as my network input, isn’t it mandatory then to resize the mask too?
also as my mask images do not have the same size as my input images, what is the correct approach to choose here?

Yes, if you are resizing your input images, you should also resize the masks, too.
However, make sure to use interpolation=PIL.Image.NEAREST, otherwise your class indices might be distorted.
Also, if you are planning on applying random data transformations, I would recommend to use the functional API as described in this post to make sure both the image and mask are using the same random values.

1 Like

Must I use interpolation=PIL.Image.NEAREST for both labels and input images?
Another issue that comes to mind here is that:
My labels are read as json files, converted into nparrays and saved into a list. What can be the order of applying the changes?

  1. First convert the label into a PIL image,
  2. Resize using ‘interpolation=PIL.Image.NEAREST’,
  3. then convert the resized image into nparray again
  4. and then apply torch.to_numpy?

If I change this order somehow, will it affect the final result?

No, you can apply any interpolation on the image (as long as your training pipeline works fine, as some interpolation methods might work better than others). The main concern is to keep your mask with class indices.

PIL uses numpy arrays internally, if I’m not mistaken, so your workflow should work fine.

Once you have created your Dataset just make sure that your mask and image tensors are still valid. I would simply read some random samples and e.g. visualize the mask.

Thank you. I proceeded this way:
'class val_Dataset(Dataset):

def __init__(self, imgarray_val, labelarray_val, transform=None):

    
    self.images = imgarray_val
    self.labels = labelarray_val
    self.transform = transform

    
def __getitem__(self, index):
   
    img = self.images[index]
    if self.transform is not None:
        img = self.transform(img)
        
    
    Img = Image.fromarray(self.labels[index]) #numpy array to PIL
    PilImg = Img.resize((224, 224),Image.NEAREST) #resize the PIL image
    label = torch.from_numpy(np.asarray(PilImg)) # convert PIL to numpy and then convert the numpyarray to tensor
    
    #trnsf1 = transforms.RandomResizedCrop(input_size)
    #trnsf2 = transforms.ToTensor()
    #label = trnsf2(trnsf1(self.masks[index]))
          
    print(img.size())
    print(label.size()) 
          
    return img, label

def __len__(self):
    return len(self.images) '

Now I encounter this error:
‘RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 0. Got 231 and 228 in dimension 3 at …\aten\src\TH/generic/THTensor.cpp:711’

what I get as output is this:

'Epoch 0/0

torch.Size([3, 224, 224]) torch.Size([224, 224])
torch.Size([3, 224, 224]) torch.Size([224, 224])
torch.Size([3, 224, 224]) torch.Size([224, 224])
torch.Size([3, 224, 224]) torch.Size([224, 224])
image shape: torch.Size([4, 3, 224, 224])
label shape: torch.Size([4, 224, 224])
squeezed label shape:torch.Size([4, 224, 224])
model output shape:torch.Size([4, 4, 224, 224])
torch.Size([3, 224, 224]) torch.Size([224, 224])
torch.Size([3, 224, 224]) torch.Size([224, 224])
torch.Size([3, 224, 224]) torch.Size([224, 224])
torch.Size([3, 224, 224]) torch.Size([224, 224])
image shape: torch.Size([4, 3, 224, 224])
label shape: torch.Size([4, 224, 224])
squeezed label shape:torch.Size([4, 224, 224])
model output shape:torch.Size([4, 4, 224, 224])
torch.Size([3, 224, 224]) torch.Size([224, 224])
torch.Size([3, 224, 224]) torch.Size([224, 224])
torch.Size([3, 224, 224]) torch.Size([224, 224])
torch.Size([3, 224, 224]) torch.Size([224, 224])
image shape: torch.Size([4, 3, 224, 224])
label shape: torch.Size([4, 224, 224])
squeezed label shape:torch.Size([4, 224, 224])
model output shape:torch.Size([4, 4, 224, 224])
torch.Size([3, 224, 224]) torch.Size([224, 224])
torch.Size([3, 224, 224]) torch.Size([224, 224])
torch.Size([3, 224, 224]) torch.Size([224, 224])
torch.Size([3, 224, 224]) torch.Size([224, 224])
image shape: torch.Size([4, 3, 224, 224])
label shape: torch.Size([4, 224, 224])
squeezed label shape:torch.Size([4, 224, 224])
model output shape:torch.Size([4, 4, 224, 224])
train Loss: 1.7062 Acc: 29483.5625
torch.Size([3, 224, 231])
torch.Size([224, 224])
torch.Size([3, 224, 228])
torch.Size([224, 224])
torch.Size([3, 224, 232])
torch.Size([224, 224])
torch.Size([3, 224, 229])
torch.Size([224, 224])

and the training function is as follows:

def train_model(model, dataloaders, criterion, optimizer, num_epochs=25, has_aux = True):
since = time.time()

val_acc_history = []

best_model_wts = copy.deepcopy(model.state_dict())#??????
best_acc = 0.0

for epoch in range(num_epochs):
    print('Epoch {}/{}'.format(epoch, num_epochs - 1))
    print('-' * 10)

    # Each epoch has a training and validation phase
    for phase in ['train', 'val']:
        if phase == 'train':
            model.train()  # Set model to training mode
        else:
            model.eval()   # Set model to evaluate mode

        running_loss = 0.0
        running_corrects = 0

        # Iterate over data.
        
        for inputs, labels in dataloaders[phase]:
            inputs, labels = Variable(inputs), Variable(labels)
            inputs = inputs.to('cpu')
            labels = labels.to('cpu')
            
            print('image shape: {}'.format(inputs.shape))
            print('label shape: {}'.format(labels.shape))
            #labels = labels.squeeze(1)
            #print('squeezed label shape:{}'.format(labels.shape))

            # zero the parameter gradients
            optimizer.zero_grad()

            # forward
            # track history if only in train
            with torch.set_grad_enabled(phase == 'train'):
                # Get model outputs and calculate loss
                # Special case for inception because in training it has an auxiliary output. In train
                #   mode we calculate the loss by summing the final output and the auxiliary output
                #   but in testing we only consider the final output.
                if has_aux and phase == 'train':
                    # From https://discuss.pytorch.org/t/how-to-optimize-inception-model-with-auxiliary-classifiers/7958
                    outputs = model(inputs)
                    print('model output shape:{}'.format(outputs['out'].shape))
                    loss1 = criterion(outputs['out'], labels.long())
                    loss2 = criterion(outputs['aux'], labels.long())
                    #loss2 = output['aux']
                    loss = loss1 + 0.4*loss2
                #else:
                 #   outputs = model(inputs)
                  #  loss = criterion(outputs, labels.long())

                _, preds = torch.max(outputs['out'], 1)

                # backward + optimize only if in training phase
                if phase == 'train':
                    loss.backward()
                    optimizer.step()

            # statistics
            running_loss += loss.item() * inputs.size(0)
            running_corrects += torch.sum(preds == labels.data.long())

        epoch_loss = running_loss / len(dataloaders[phase].dataset)
        epoch_acc = running_corrects.double() / len(dataloaders[phase].dataset)

        print('{} Loss: {:.4f} Acc: {:.4f}'.format(phase, epoch_loss, epoch_acc))

        # deep copy the model
        if phase == 'val' and epoch_acc > best_acc:
            best_acc = epoch_acc
            best_model_wts = copy.deepcopy(model.state_dict())
        if phase == 'val':
            val_acc_history.append(epoch_acc)

    print()

time_elapsed = time.time() - since
print('Training complete in {:.0f}m {:.0f}s'.format(time_elapsed // 60, time_elapsed % 60))
print('Best val Acc: {:4f}'.format(best_acc))

# load best model weights
model.load_state_dict(best_model_wts)
return model, val_acc_history

What I understood so far is that transformation works fine for the train images and not the validation images but I am checking my code and I cannot find the error!
this is the data set function and data loaders:

data_transforms = {
‘train’: transforms.Compose([
transforms.Resize(input_size, input_size),
#transforms.RandomHorizontalFlip(),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
]),
‘val’: transforms.Compose([
transforms.Resize(input_size, input_size),
#transforms.CenterCrop(input_size),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
]),
}
print(“Initializing Datasets and Dataloaders…”)

class Train_Dataset(Dataset):

def __init__(self, imgarray, labelarray, transform=None):

    
    self.images = imgarray
    self.labels = labelarray
    self.transform = transform

    
def __getitem__(self, index):
    
    img = self.images[index]
    if self.transform is not None:
        img = self.transform(img)  
    
   
    Img = Image.fromarray(self.labels[index]) #numpy array to PIL
    PilImg = Img.resize((224, 224), Image.NEAREST ) #resize the PIL image
    label = torch.from_numpy(np.asarray(PilImg)) # convert PIL to numpy and then convert the numpyarray to tensor
    

    print(img.size(), label.size())
    return img, label

def __len__(self):
    return len(self.images) 

class val_Dataset(Dataset):

def __init__(self, imgarray_val, labelarray_val, transform=None):

    
    self.images = imgarray_val
    self.labels = labelarray_val
    self.transform = transform

    
def __getitem__(self, index):
   
    img = self.images[index]
    if self.transform is not None:
        img = self.transform(img)
        
    
    Img = Image.fromarray(self.labels[index]) #numpy array to PIL
    PilImg = Img.resize((224, 224),Image.NEAREST) #resize the PIL image
    label = torch.from_numpy(np.asarray(PilImg)) # convert PIL to numpy and then convert the numpyarray to tensor
    
    #trnsf1 = transforms.RandomResizedCrop(input_size)
    #trnsf2 = transforms.ToTensor()
    #label = trnsf2(trnsf1(self.masks[index]))
          
    print(img.size(), label.size())

          
    return img, label

def __len__(self):
    return len(self.images) 

I markeddown two lines of code in transformation code block:
for train set
#transforms.RandomHorizontalFlip(),
and for val set:
#transforms.CenterCrop(input_size),
might it be this is the problem?

Could you try to pass the size as a tuple to Resize?

transforms.Resize((input_size, input_size))
# instead of
transforms.Resize(input_size, input_size)

The second argument should be used as the interpolation argument, and should throw an error, if you are trying to resize the input. If one side of the passed image has already the desired shape, this might be a no-op, thus now throwing the error.

I changed the argument into a tuple and now it is working. Thank you very much :slight_smile: