First semantic segmentation: Tensor dimension wrong

Hello,
I am trying to train a semantic segmentation network for the first time and in general I am still quite new with PyTorch.

My inputs are RGB-images and corresponding grayscale images. In them the grayscale intensity corresponds to the pixel class.

Please see my first try in the following.
Right now I am getting the error: “Expected 4-dimensional input for 4-dimensional weight 64 3 7 7, but got 3-dimensional input of size [1, 1000, 1000] instead” from the line “output = model(img)” in the training loop.
The input with 1x1000x100 is the mask with the class labels.

How can this be solved and do you have further advises for me?

Thank you very much :slight_smile:

import torch
import os
import numpy as np
import torch.nn as nn
import torch.optim as optim

# DataLoader
from PIL import Image
from torchvision import transforms
from torch.utils.data.sampler import SubsetRandomSampler

class Dataset(torch.utils.data.Dataset):
    def __init__(self, root, transforms=None):
        self.root = root
        self.transforms = transforms
        # load all image files, sorting them to ensure that they are aligned
        self.imgs = list(sorted(os.listdir(os.path.join(root, "Images")))) # Subfolder with images
        self.masks = list(sorted(os.listdir(os.path.join(root, "ImageLabels")))) # Subfolder with masks

    def __getitem__(self, idx):
        # load images ad masks
        img_path = os.path.join(self.root, "Images", self.imgs[idx])
        mask_path = os.path.join(self.root, "ImageLabels", self.masks[idx])
        img = Image.open(img_path).convert("RGB") # Three channel rgb images
        mask = Image.open(mask_path) # Single channel and the pixelintensity corresponds to the class label

        if self.transforms is not None:
            img = self.transforms(img)
            mask = self.transforms(mask)
            
            img =transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])(img)
            # No need to normalize masks, they are all equal
    
        return img, mask

    def __len__(self):
        return len(self.imgs)


transformations = transforms.Compose([
    transforms.CenterCrop(1000),
    transforms.ToTensor()
])


# Load data
dataset = Dataset('/images/', transforms=transformations)

# Split data (Random)
train_size = int(0.8 * len(dataset))
test_size = len(dataset) - train_size
train_dataset, test_dataset = torch.utils.data.random_split(dataset, [train_size, test_size])

dataloader_train=torch.utils.data.DataLoader(train_dataset,
        batch_size=10, shuffle=True)
dataloader_test=torch.utils.data.DataLoader(test_dataset,
        batch_size=10, shuffle=False)


# Model:
from torchvision import models
model = models.segmentation.deeplabv3_resnet101(pretrained=False, progress=True, num_classes=output_classes)
criterion = nn.BCEWithLogitsLoss().cuda
optimizer = optim.SGD(model.parameters(), lr=0.001, momentum=0.9)
model.train()


# Use gpu if available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
if torch.cuda.device_count() > 1:
    print("Let's use", torch.cuda.device_count(), "GPUs!")
    model = nn.DataParallel(model)

model.to(device)


# Training
num_epochs = 1
for iter in range(num_epochs):
    # Training:
    model.train() 
    for i in range(len(train_loader)):
        img, mask = train_loader[i]
        img, mask = img.cuda(), mask.cuda()
        
        optimizer.zero_grad()
        img = img.to(device)
        mask = mask.to(device)
        output = model(img)
        loss = criterion(output, mask)
        print(loss.cpu())
        loss.backward()
        optimizer.step()
    
    # Validation:
    model.eval() 
    for i in range(len(test_loader)):
        img, mask = test_loader[i]
        img, mask = img.cuda(), mask.cuda()
        output = model.forward(img) # forward pass
        loss = criterion(output, mask)
        print(loss.cpu())

DataLoader shouldn’t support indexing, so this line of code should raise an error:

        img, mask = train_loader[i]

Which PyTorch version are you using? Maybe I’m not aware of such a change.

Thanks for the information.
I found it like that somewhere…

I changed the dataloader loop and printed the size of the tensors inside it.
At least this now seems to be ok.
The image-tensors have the size torch.Size([1, 3, 1000, 1000]) and the masks have torch.Size([1, 1, 1000, 1000]).

This is the current training loop:

num_epochs = 1
for e in range(num_epochs):
    # Training:
    model.train() 
    for img, mask in dataloader_train:
        img, mask = img.cuda(), mask.cuda()
        
        #img = torch.squeeze(img, 0)
        #mask = torch.squeeze(mask, 0)
        
        # For debug only:
        #print(img.shape)
        #print(mask.shape)
        #print(torch.squeeze(img, 0).shape)
        #break
        
        img = img.to(device)
        mask = mask.to(device)
        output = model(img)
        loss = criterion(output, mask)
        print(loss.cpu())
        
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

Now I get another error in the line output = model(img):

…python3.8/site-packages/torch/nn/functional.py", line 1666, in batch_norm
raise ValueError(‘Expected more than 1 value per channel when training, got input size {}’.format(size))
ValueError: Expected more than 1 value per channel when training, got input size torch.Size([1, 256, 1, 1])

I am wondering why the tensors have four dimensions instead of three.
Because of that I used img = torch.squeeze(img, 0), mask = torch.squeeze(mask, 0) but with that I get the error:

…python3.8/site-packages/torch/nn/modules/conv.py", line 341, in conv2d_forward
return F.conv2d(input, weight, self.bias, self.stride,
RuntimeError: Expected 4-dimensional input for 4-dimensional weight 64 3 7 7, but got 3-dimensional input of size [1, 1000, 1000] instead

Do I have to squeeze or not and where comes the corresponding error from?
Thank you very much :slight_smile:

import torch
import os
import numpy as np
import torch.nn as nn
import torch.optim as optim

# DataLoader
from PIL import Image
from torchvision import transforms
from torch.utils.data.sampler import SubsetRandomSampler

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()

        self.relu = nn.ReLU()
        self.conv1 = nn.Conv2d(3, 64, (5, 5), (1, 1), (2, 2))
        self.conv2 = nn.Conv2d(64, 64, (3, 3), (1, 1), (1, 1))
        self.conv3 = nn.Conv2d(64, 32, (3, 3), (1, 1), (1, 1))
        self.conv4 = nn.Conv2d(32, 1, (3, 3), (1, 1), (1, 1))

    def forward(self, x):
        x = self.relu(self.conv1(x))
        x = self.relu(self.conv2(x))
        x = self.relu(self.conv3(x))
        x = self.conv4(x)
        return x


img  = torch.randn(1,3, 250, 250, requires_grad = True)
mask  = torch.randn(1,1, 250, 250, requires_grad = True)
# Model:
from torchvision import models
model = Net()
criterion = nn.BCEWithLogitsLoss()
optimizer = optim.SGD(model.parameters(), lr=0.001, momentum=0.9)
model.train()


# Use gpu if available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
if torch.cuda.device_count() > 1:
    print("Let's use", torch.cuda.device_count(), "GPUs!")
    model = nn.DataParallel(model)

model.to(device)


# Training
num_epochs = 1
for iter in range(num_epochs):
    # Training:
    model.train() 
    for i in range(10):
        # img, mask = train_loader[i]
        img, mask = img.cuda(), mask.cuda()
        
        optimizer.zero_grad()
        img = img.to(device)
        mask = mask.to(device)
        output = model(img)
        loss = criterion(output, mask)
        print(loss.cpu())
        loss.backward()
        optimizer.step()
    
    # Validation:
    model.eval() 
    for i in range(10):
        img, mask = test_loader[i]
        img, mask = img.cuda(), mask.cuda()
        output = model.forward(img) # forward pass
        loss = criterion(output, mask)
        print(loss.cpu())

This is a toy code based on yours. Hope this helps

The first error is thrown in a batch norm layer, which cannot calculate the current batch statistics for 1 value per channel.
The input shape of an image tensor is expected to be [batch_size, channels, height, width]. In your case the activation is a single sample with 256 channels and one pixel.
To avoid this error you could either increase the spatial size of the initial input, increase the batch size alternatively, or remove some pooling operation from your model.

Thanks for the toy code!
I tried using it, but here I also get an error I could not solve.
The line loss = criterion(output, mask) leads to:

cuda() takes from 1 to 2 positional arguments but 3 were given

With the original code I tried increasing the batch size as well as the spartial size of die initial input. Both did not change anything. It is still the same error. Shouldn’t it work without changing the pooling operation of the model?

Thanks

The initial issue is that your activation just has a single sample and a single pixel. The suggested fixes would either increase the spatial size or increase the batch size directly.

What are you passing to the cuda() call?
Is the first issue solved now?

Are you sure you are using the same because I don’t get any errors. In your original code, you are having a line “criterion = nn.BCEWithLogitsLoss().cuda”, you need to change it to “criterion = nn.BCEWithLogitsLoss().cuda()” or “criterion = nn.BCEWithLogitsLoss()”. But I am not sure if you need to pass the criterion to cude since inputs and targets are already there.

1 Like

Thank you two very much.

The mistake was the typo praveen_kandula pointed out!

how to pass only the row*col pixel values instead of the batch size and the channels values.

I’m not sure I understand the question correctly.
Most nn.Modules expect an input with the batch dimension in dim0 (RNNs are an exception in the default setup).
Your current error seems to be raised, as it seems network is a class definition and not the model instance.
If that’s the case, you would have to create an instance via:

class network(nn.Module):
    ...

model = network()
...
output = model(data)

PS: You can post code snippets by wrapping them into three backticks ```, which makes debugging easier. :wink:

1 Like

Input type (torch.cuda.FloatTensor) and weight type (torch.FloatTensor) should be the same
what does this error mean.

The error explains, that your data is pushed to the GPU (torch.cuda.FloatTensor), while the parameters of your model are still on the CPU.
To push all parameters and buffers of your model to the GPU, you could use:

model.to('cuda')

size of the labels is: torch.size([50,10])
How to convert it to single dimension. Weather the batch size(50) is necessary or not.
Let me know the code for it plz.
https://www.kaggle.com/basavarajrp4444/kernel755d8a6dc5/edit/run/36854797
Thank you.

If the target is one-hot encoded and you are dealing with 10 classes, you can create the class indices via:

torch.argmax(targets, dim=1)

If that’s not the case, I would need more information what the dimensions of the target refer to.

actually my data set contains some multi label targets and some are not multi label targets then how to deal with it. Please fix this problem .
Notebook url:
https://www.kaggle.com/basavarajrp4444/kernel755d8a6dc5/edit/run/36854797
Thank you.

What are the “not multi label targets” then?
Are you dealing with different datasets using different classification types?

Your Kaggle kernel is a bit too long to debug it from scratch, so please add more information here. :wink: