ValueError: Expected input batch_size (324) to match target batch_size (4)

train_transform = transforms.Compose([
transforms.Grayscale(num_output_channels=1),
transforms.RandomHorizontalFlip(),
transforms.RandomVerticalFlip(p=0.6),
#transforms.ColorJitter(brightness = 3),
transforms.RandomRotation(degrees= 30),
transforms.ToTensor(),

TypeError: function takes exactly 1 argument (3 given)
Converting image to greyscale in Transformfunction, gives the above error

That’s the second error I mentioned here:

Your encoded_output shape looks as if it contains logits for a multi-class segmentation, while the targets are for a multi-class classification with another batch size.

Your labels seem to have the shape of a multi-class classification, so you might need to use linear layers at the end to output [batch_size, nb_classes], while your model currently outputs spatial logits.

Got You!
class Autoencoder(nn.Module):

def __init__(self):
    super(Autoencoder, self).__init__()
    self.encoder = nn.Sequential(
       # nn.Dropout2d(dropout_p= 0.5),
        nn.Conv2d(1, 16,kernel_size=3,stride =1, padding= 1,bias= False),  
        nn.ReLU(True),
        nn.MaxPool2d( 2, 2),  
        nn.Conv2d(16,32,kernel_size=3,stride =1, padding=1),  
        nn.ReLU(True),
        nn.MaxPool2d(2, 2),  
        nn.Conv2d(32,64,kernel_size=3,stride =1, padding= 1),  
        nn.ReLU(True),
        nn.MaxPool2d(2, 2),  
        
    )
    self.decoder = nn.Sequential(
        nn.MaxUnpool2d(2, 2),
        nn.ConvTranspose2d(64, 32, kernel_size=3), 
        nn.ReLU(True),
        nn.MaxUnpool2d(2, 2),
        nn.ConvTranspose2d(32, 16, kernel_size=3),  
        nn.ReLU(True),
        nn.MaxUnpool2d(2, 2),
        nn.ConvTranspose2d(16, 8, kernel_size=3),
        nn.Linear(* * *8)
You mean to say adding Linear at the end of the decoder, but I am not sure what to put in first two parameters I mean how to calculate the size of the input features.

torch size of the img
torch.Size([20, 1, 240, 320])
torch size of encoded_output
torch.Size([20, 64, 120, 160])

I believe, it is right now, but I am still stuck with entropy issue and not sure how to use the parameters in Linear. Need Help

Sorry for missing this, but based on your architecture I don’t think this is your real use case.
Could you explain what your target defines?
Your architecture suggest a model, which might be similar to an autoencoder, while your target seems to come from a classification use case.

@ptrblck
Thank you for getting back. Yes, my architecture is auto-encoder and I want to train the model to extract the features of the image dataset which I have provided.I have made few changes in the auto encoder, however I am getting the below error.
I have added the linear layer and hard-coded the out_features based on the image_sizes, I feel values of out_features are arbitrary that is why I chose that value. Please help me out.

RuntimeError: size mismatch, m1: [18240 x 77], m2: [70224 x 20000] at /Users/distiller/project/conda/conda-bld/pytorch_1579022061893/work/aten/src/TH/generic/THTensorMath.cpp:136

Please find the code below:

Define Autoencoder

#dropout_p= 0.5
class Autoencoder(nn.Module):

def __init__(self):
    super(Autoencoder, self).__init__()
    self.encoder = nn.Sequential(
       # nn.Dropout2d(dropout_p= 0.5),
        nn.Conv2d(1, 6,kernel_size=(5, 5),stride =1),  
        nn.ReLU(True),
        nn.MaxPool2d( 2, 2),  
        nn.Conv2d(6, 16,kernel_size=(5, 5),stride =1),  
        nn.ReLU(True),
        nn.MaxPool2d(2, 2),  
        nn.Linear(in_features=70224, out_features=20000, bias=True),
        nn.Linear(in_features=20000, out_features=14000, bias=True),
        nn.Linear(in_features=14000, out_features=2000, bias=True)
        #nn.MaxPool2d(2, 2),  
        
    )
    self.decoder = nn.Sequential(
        nn.MaxUnpool2d(2, 2),
        nn.ConvTranspose2d(64, 32, kernel_size=3), 
        nn.ReLU(True),
        nn.MaxUnpool2d(2, 2),
        nn.ConvTranspose2d(32, 16, kernel_size=3),  
        nn.ReLU(True),
        nn.MaxUnpool2d(2, 2),
        nn.ConvTranspose2d(16, 8, kernel_size=3),
        
        
        #nn.Tanh()
    )

def forward(self, x):
    x = self.encoder(x)
    encoded_x = x
    x = self.decoder(x)
    print(x.shape)
   # x = x.view(-1,16*5*7)
    x = F.log_softmax(self.fc2(x), dim=0)
    return x, encoded_x

num_epochs = 5
learning_rate = 1e-5
model= Autoencoder()
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
for epoch in range(num_epochs):

for data in loader_train_set:
    
    img, label = data
    print(img.size())
    img = torch.tensor(img)
    labels = labels.view(-1)
    img = img[:, 0:1]
    encoded_output = model.encoder(img)
    
    print(encoded_output.size())
    
    
    
    
    loss = criterion(encoded_output, label)
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()



print('epoch [{}/{}], loss:{:.4f}'.format(epoch+1, num_epochs, loss.data()))

“”"

KIndly help

0%| | 0/469 [00:00<?, ?it/s]torch.Size([128, 10, 14, 14])
torch.Size([25088, 10])
torch.Size([25088, 10])
/usr/local/lib/python3.6/dist-packages/ipykernel_launcher.py:38: UserWarning: Implicit dimension choice for log_softmax has been deprecated. Change the call to include dim=X as an argument.

ValueError Traceback (most recent call last)
in ()
4
5 for epoch in range(1, 3):
----> 6 train(model, device, train_loader, optimizer, epoch)
7 #test(model, device, test_loader)

1 frames
/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py in nll_loss(input, target, weight, size_average, ignore_index, reduce, reduction)
2111 if input.size(0) != target.size(0):
2112 raise ValueError(‘Expected input batch_size ({}) to match target batch_size ({}).’
-> 2113 .format(input.size(0), target.size(0)))
2114 if dim == 2:
2115 ret = torch._C._nn.nll_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index)

ValueError: Expected input batch_size (25088) to match target batch_size (128).
“”"

from future import print_function
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms

class Net(nn.Module):
def init(self):
super(Net, self).init()
self.conv1 = nn.Conv2d(1, 16, 3, padding=1) #input -? OUtput? RF 3
self.conv2 = nn.Conv2d(16, 32, 3, padding=1) # RF 5
self.pool1 = nn.MaxPool2d(2, 2) # RF 10
self.conv3 = nn.Conv2d(32, 10, 3, padding=1) # RF 12
self.conv4 = nn.Conv2d(128, 256, 3, padding=1) # RF 14
self.pool2 = nn.MaxPool2d(2, 2) # RF 28
self.conv5 = nn.Conv2d(256, 10, 3) # RF 30
self.conv6 = nn.Conv2d(512, 10, 3) # RF 32
self.conv7 = nn.Conv2d(1024, 10, 3)

def forward(self, x):
    # x = self.pool1(F.relu(self.conv2(F.relu(self.conv1(x)))))
    # x = self.pool2(F.relu(self.conv4(F.relu(self.conv3(x)))))
    # x = F.relu(self.conv6(F.relu(self.conv5(x))))
    # x = F.relu(self.conv7(x))
    # x = F.log_softmax(x)
    x = self.conv1(x)
    x = F.relu(x)
    x = self.conv2(x)
    x = F.relu(x)
    x = self.pool1(x)
    x = F.relu(x)
    x = self.conv3(x)
    print(x.shape)
    # x = F.relu(x)
    # x = self.conv4(x)
    # x = F.relu(x)
    # x = self.pool2(x)
    # x = F.relu(x)
    # x = self.conv5(x)
    #x = F.relu(x)
    #x = self.conv7(x)
    x = x.view(-1, 10)
    print(x.shape)
    x = F.log_softmax(x)
    print(x.shape)
    #x = x.view(-1, 10)
    
    return x

!pip install torchsummary
from torchsummary import summary
use_cuda = torch.cuda.is_available()
device = torch.device(“cuda” if use_cuda else “cpu”)
model = Net().to(device)
summary(model, input_size=(1, 28, 28))

torch.manual_seed(1)
batch_size = 128

kwargs = {‘num_workers’: 1, ‘pin_memory’: True} if use_cuda else {}
train_loader = torch.utils.data.DataLoader(
datasets.MNIST(’…/data’, train=True, download=True,
transform=transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.1307,), (0.3081,))
])),
batch_size=batch_size, shuffle=True, **kwargs)
test_loader = torch.utils.data.DataLoader(
datasets.MNIST(’…/data’, train=False, transform=transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.1307,), (0.3081,))
])),
batch_size=batch_size, shuffle=True, **kwargs)

from tqdm import tqdm
def train(model, device, train_loader, optimizer, epoch):
model.train()
pbar = tqdm(train_loader)
for batch_idx, (data, target) in enumerate(pbar):
data, target = data.to(device), target.to(device)
optimizer.zero_grad()
output = model(data)
loss = F.nll_loss(output, target)
loss.backward()
optimizer.step()
pbar.set_description(desc= f’loss={loss.item()} batch_id={batch_idx}’)

def test(model, device, test_loader):
model.eval()
test_loss = 0
correct = 0
with torch.no_grad():
for data, target in test_loader:
data, target = data.to(device), target.to(device)
output = model(data)
test_loss += F.nll_loss(output, target, reduction=‘sum’).item() # sum up batch loss
pred = output.argmax(dim=1, keepdim=True) # get the index of the max log-probability
correct += pred.eq(target.view_as(pred)).sum().item()

test_loss /= len(test_loader.dataset)

print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
    test_loss, correct, len(test_loader.dataset),
    100. * correct / len(test_loader.dataset)))

model = Net().to(device)
optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9)

for epoch in range(1, 3):
train(model, device, train_loader, optimizer, epoch)
#test(model, device, test_loader)

Try to change x = x.view(-1, 10) to x = x.view(x.size(0), -1) and rerun the code.
This should keep the batch size and could yield a shape mismatch in the feature dimension.

PS: you can post code snippets by wrapping them into three backticks ```, which makes debugging easier :wink:

Thanks a lot.
The code is working.

Hello,

I am performing segmentation, and I am facing similar issue.

ValueError: Expected input batch_size (250880) to match target batch_size (10).

Here is my code, please help me to resolve this issue.

class Encoder(nn.Module):
    def __init__(self, n_in, n_out, activation):
        super(Encoder, self).__init__()

        # Encoder consisting of conv layer -> batch normalization ->
        # activation function.

        # n_in: Number of features in to encoder.
        # n_out: Number of features out of encoder
        # activation: Activation function.

        self.block = nn.Sequential(
            *([nn.Conv2d(n_in, n_out, kernel_size=3, padding=1),
               nn.BatchNorm2d(n_out),
               activation, ])
        )
        self.block_out = nn.Sequential(
            *([nn.Conv2d(n_out, n_out, kernel_size=3, padding=1),
               nn.BatchNorm2d(n_out),
               activation, ])
        )

    def forward(self, x):                           # Forward pass for encoder.

        out = self.block(x)
        out = self.block_out(out)
        return(out)


class Decoder(nn.Module):
    def __init__(self, n_in, n_mid, n_out, activation):
        super(Decoder, self).__init__()

        # Decoder consisting of conv layer -> batch normalization ->
        # activation function -> transposed convolution.

        # n_in: Number of features in to decoder.
        # n_mid: Number of features in center part of decoder.
        # n_out: Number of features out of decoder.
        # activation: Activation function.

        self.block = nn.Sequential(
            *([nn.Conv2d(n_in, n_mid, kernel_size=3, padding=1),
               nn.BatchNorm2d(n_mid),
               activation, ])
        )
        self.block_out = nn.Sequential(
            *([nn.Conv2d(n_mid, n_out, kernel_size=3, padding=1),
               nn.BatchNorm2d(n_out),
               activation,
               nn.ConvTranspose2d(n_out, n_out, kernel_size=4,
                                  stride=2, padding=1, bias=False), ])
        )

    def forward(self, x):                           # Forward pass for decoder.

        out = self.block(x)
        out = self.block_out(out)

        return(out)


class Unet(nn.Module):
    def __init__(self, num_classes, activation):
        super(Unet, self).__init__()

        # U-net with dropout included in the center block, dropout rate = 0.5.
        # Max-pooling is performed in the forward pass function.
        # Last part of forward pass reshapes output into shape
        # (Number of pixels * Number of images, Number of classe ) to fit
        # cross entropy cost of Pytorch.

        self.enc1 = Encoder(3, 64, activation)
        self.enc2 = Encoder(64, 128, activation)
        self.enc3 = Encoder(128, 256, activation)
        self.enc4 = Encoder(256, 512, activation)

        self.center = nn.Sequential(
            *([nn.Dropout2d(),
               nn.Conv2d(512, 1024, kernel_size=3, padding=1),
               nn.BatchNorm2d(1024),
               activation,
               nn.Dropout2d(),
               nn.Conv2d(1024, 512, kernel_size=3, padding=1),
               nn.BatchNorm2d(512),
               activation,
               nn.ConvTranspose2d(512, 512, kernel_size=4, stride=2,
                                  padding=1), ])
        )

        self.dec4 = Decoder(1024, 512, 256, activation)
        self.dec3 = Decoder(512, 256, 128, activation)
        self.dec2 = Decoder(256, 128, 64, activation)
        self.dec1 = nn.Sequential(
            *([nn.Conv2d(128, 64, kernel_size=3, padding=1),
               nn.BatchNorm2d(64),
               activation,
               nn.Conv2d(64, 64, kernel_size=3, padding=1),
               nn.BatchNorm2d(64),
               activation,
               nn.Conv2d(64, num_classes, kernel_size=1, padding=0), ])
        )

        for m in self.modules():
            self.weight_init(m)

    def forward(self, x):                           # Forward pass for network.

        enc1 = self.enc1(x)
        enc2 = self.enc2(nF.max_pool2d(enc1, kernel_size=2, stride=2))
        enc3 = self.enc3(nF.max_pool2d(enc2, kernel_size=2, stride=2))
        enc4 = self.enc4(nF.max_pool2d(enc3, kernel_size=2, stride=2))

        center = self.center(nF.max_pool2d(enc4, kernel_size=2, stride=2))

        dec4 = self.dec4(torch.cat([center, enc4], 1))
        dec3 = self.dec3(torch.cat([dec4, enc3], 1))
        dec2 = self.dec2(torch.cat([dec3, enc2], 1))
        dec1 = self.dec1(torch.cat([dec2, enc1], 1))

        out = dec1.permute(1, 0, 2, 3).contiguous()
        out = out.view(2, -1)
        out = out.permute(1, 0)

        return out

    def weight_init(self, m):
        if isinstance(m, nn.Conv2d):
            init.kaiming_normal(m.weight.data)
            init.constant(m.bias.data, 1)
        if isinstance(m, nn.BatchNorm2d):
            init.constant(m.weight.data, 1)
            init.constant(m.bias.data, 0)

class ToTensor(object):
    def __call__(self, sample):
        image, label = sample['image'], sample['label']
        return {'image': F.to_tensor(image), 'label': F.to_tensor(label)}


model = Unet( num_classes=1, activation=nn.ReLU() ).cuda()
optimizer = torch.optim.Adam(model.parameters())
criterion = nn.CrossEntropyLoss()
lr_scheduler = optim.lr_scheduler.MultiStepLR(optimizer, milestones=[20, 100, 125], gamma=0.1)

transform1 = transforms.Compose([ToTensor()])

dataset = PolypsDataset(image_dir, label_dir, transform=transform1)
dataloader = DataLoader(dataset, batch_size = batch_size, shuffle=True, num_workers = batch_size, drop_last = True)
dataset_sizes = len(dataset)
batch_num = int(dataset_sizes/batch_size)

for epoch in range(epochs):
  
  print('Starting epoch {}/{}.'.format(epoch + 1, epochs))
  
  model.train()
  
  lr_scheduler.step()
  epoch_loss = 0
  
  for i_batch, sample_batched in enumerate(dataloader):
    optimizer.zero_grad()
    train_image = sample_batched['image']
    train_label = sample_batched['label']
    
    if torch.cuda.is_available():
      train_image = train_image.cuda()
      train_label = train_label.cuda()
    
    label_pred = model(train_image)
    label_probs = torch.sigmoid(label_pred)
    
    loss = criterion(label_probs, train_label)
    loss.backward()

    optimizer.step()
    epoch_loss += loss.item()
    
    print('Train Loss: {}'.format(epoch_loss/batch_num))

Could you explain, what these lines of code are doing:

        out = dec1.permute(1, 0, 2, 3).contiguous()
        out = out.view(2, -1)
        out = out.permute(1, 0)

It seems that you are swapping the channel dimension with the batch dimension and later flatten the batch dimension into a feature dim?
I’m not familiar with your use case, but this doesn’t sounds right and might yield the shape mismatch.

Thanks very much for reply me back.

Actually, i am using this Unet architecture code for polyp segmentation (images, and their masks). This code is not written by me, i am new researcher. But i want to share few details with you. My image and mask image size is 224 (RGB) x 224 (Black and white)., here i presented this uNet architecture. Please suggest me some way to make this code work.

Unet(
(enc1): Encoder(
(block): Sequential(
(0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU()
)
(block_out): Sequential(
(0): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU()
)
)
(enc2): Encoder(
(block): Sequential(
(0): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU()
)
(block_out): Sequential(
(0): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU()
)
)
(enc3): Encoder(
(block): Sequential(
(0): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU()
)
(block_out): Sequential(
(0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU()
)
)
(enc4): Encoder(
(block): Sequential(
(0): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU()
)
(block_out): Sequential(
(0): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU()
)
)
(center): Sequential(
(0): Dropout2d(p=0.5, inplace=False)
(1): Conv2d(512, 1024, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(2): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(3): ReLU()
(4): Dropout2d(p=0.5, inplace=False)
(5): Conv2d(1024, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(6): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(7): ReLU()
(8): ConvTranspose2d(512, 512, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
)
(dec4): Decoder(
(block): Sequential(
(0): Conv2d(1024, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU()
)
(block_out): Sequential(
(0): Conv2d(512, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU()
(3): ConvTranspose2d(256, 256, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
)
)
(dec3): Decoder(
(block): Sequential(
(0): Conv2d(512, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU()
)
(block_out): Sequential(
(0): Conv2d(256, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU()
(3): ConvTranspose2d(128, 128, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
)
)
(dec2): Decoder(
(block): Sequential(
(0): Conv2d(256, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU()
)
(block_out): Sequential(
(0): Conv2d(128, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU()
(3): ConvTranspose2d(64, 64, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
)
)
(dec1): Sequential(
(0): Conv2d(128, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU()
(3): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(4): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(5): ReLU()
(6): Conv2d(64, 1, kernel_size=(1, 1), stride=(1, 1))
)
)

Based on the initial model definition it seems that only nn....2d layers are used (such as nn.Conv2d and nn.MaxPool2d).
These layers expect the input to have the shape [batch_size, channels, height, width] and will output the same number of dimensions with the same meaning. Note that the number of channels, height, and width might of course change based on the layer, but the “meaning” of these dimensions stays the same.

In this permutation code, this would happen:

# dec1 is the output of self.dec1 (nn.Conv2d as last layer)
# dec1.shape: [batch_size, channels=num_classes, height, width]
out = dec1.permute(1, 0, 2, 3).contiguous()
# out.shape: [channels=num_classes, batch_size, height, width]
out = out.view(2, -1)
# at this point, bad things might happen, as the code is not even using `num_classes` for dim0, but a hardcoded 2. Where is this value coming from? Is the model only defined for num_classes=2?
# out.shape: [2, "rest"] 
out = out.permute(1, 0)
# out.shape: ["rest", 2] 

Please have a look at my commends in the permutation code.
I would recommend to completely remove this code part and use the output as [batch_size, nb_classes, height, width].
For a multi-class segmentation use case, you could use nn.CrossEntropyLoss (which is already in the code), which expects raw logits in the mentioned shape, and the targets should have the shape [batch_size, height, width] containing class indices in the range [0, nb_classes-1].
In your current code you are also applying torch.sigmoid on the model output, which is not expected for nn.CrossEntropyLoss.

I find out mystery of out = out.view(2, -1), in his model, researcher was using 2 classes,

model = Unet( num_classes=2, activation=nn.ReLU() ).cuda(), maybe he is considering image and label as two classes, or maybe polyp or black background as two classes.

But i am still confuse how i made this model work to analyse its output

Let’s start with removing the “permutation” code and check the model output shape.
After removing the code, the output should have the shape [batch_size, nb_classes=2, height, width].
For a multi-label classification using nn.CrossEntropyLoss, the target should have the shape [batch_size, height, width] and contain the class indices in the range [0, nb_classes-1].

Could you check the shapes of the output and target and report, if they match?

Thanks very much for your help,

With the help of my friend, i fixed U-net model issue, Now my training, validation model is working, But unfortunately, i am badly stuck in Testing Module. I am facing following error again and again. Please review this code, I will really appreciate that.

INFO: Creating dataset with 137 examples

TypeError Traceback (most recent call last)
in ()
40 test_label = test_label.to(device=device, dtype=torch.float32)
41
—> 42 predict_label = net(test_image)
43 predict_probs = torch.sigmoid(predict_label)
44

4 frames
/usr/local/lib/python3.6/dist-packages/torchvision/transforms/functional.py in pad(img, padding, fill, padding_mode)
286 “”"
287 if not _is_pil_image(img):
→ 288 raise TypeError(‘img should be PIL Image. Got {}’.format(type(img)))
289
290 if not isinstance(padding, (numbers.Number, tuple)):

TypeError: img should be PIL Image. Got <class ‘torch.Tensor’>

My Code is here

Dataset

class BasicDataset(Dataset):
def init(self, imgs_dir, masks_dir, scale=1):
self.imgs_dir = imgs_dir
self.masks_dir = masks_dir
self.scale = scale
assert 0 < scale <= 1, ‘Scale must be between 0 and 1’

    self.ids = [splitext(file)[0] for file in listdir(imgs_dir)
                if not file.startswith('.')]
    logging.info(f'Creating dataset with {len(self.ids)} examples')

def __len__(self):
    return len(self.ids)

@classmethod
def preprocess(cls, pil_img, scale):
    w, h = pil_img.size
    newW, newH = int(scale * w), int(scale * h)
    assert newW > 0 and newH > 0, 'Scale is too small'
    pil_img = pil_img.resize((newW, newH))

    img_nd = np.array(pil_img)

    if len(img_nd.shape) == 2:
        img_nd = np.expand_dims(img_nd, axis=2)

    # HWC to CHW
    img_trans = img_nd.transpose((2, 0, 1))
    if img_trans.max() > 1:
        img_trans = img_trans / 255

    return img_trans

def __getitem__(self, i):
    idx = self.ids[i]
    mask_file = glob(self.masks_dir + idx + '.*')
    img_file = glob(self.imgs_dir + idx + '.*')

    assert len(mask_file) == 1, \
        f'Either no mask or multiple masks found for the ID {idx}: {mask_file}'
    assert len(img_file) == 1, \
        f'Either no image or multiple images found for the ID {idx}: {img_file}'
    mask = Image.open(mask_file[0])
    img = Image.open(img_file[0])

    assert img.size == mask.size, \
        f'Image and mask {idx} should be the same size, but are {img.size} and {mask.size}'

    img = self.preprocess(img, self.scale)
    mask = self.preprocess(mask, self.scale)

    return {
        'image': torch.from_numpy(img).type(torch.FloatTensor),
        'mask': torch.from_numpy(mask).type(torch.FloatTensor)
    }

Testing Module

net = UNet(n_channels=3, n_classes=1, bilinear=True)
device = torch.device(‘cuda’ if torch.cuda.is_available() else ‘cpu’)
net.cuda()
net.eval()

test_dir_root = ‘/Datasets/test/’

test_image_dir = test_dir_root +‘images/’
test_label_dir = test_dir_root +‘labels/’

for checkpoint in range(1, 15):

net.load_state_dict(torch.load(dir_checkpoint + ‘CP_epoch’ + str(5 * checkpoint - 4) + ‘.pth’, map_location=device))

test_dataset = Dataset(test_image_dir, test_label_dir, scale=1)

dataloader = DataLoader(test_dataset, batch_size = 4)

dataset_sizes = len(test_dataset)

batch_num = int(dataset_sizes / batch_size)

for i_batch, sample_batched in enumerate(dataloader):

test_image = sample_batched['image']
test_label = sample_batched['mask']

if torch.cuda.is_available():
  test_image = test_image.to(device=device, dtype=torch.float32)
  test_label = test_label.to(device=device, dtype=torch.float32)
  
  predict_label = net(test_image)
  predict_probs = torch.sigmoid(predict_label)

i am getting n error which says TypeError: conv2d(): argument ‘input’ (position 1) must be Tensor, not int. i have used keras till now and have just switched to pytorch. can you please help?

Also since i have just started using pytorch, i would be grateful if you could recommend some resources from where i can learn pytorch

import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
import torchvision
import torchvision.transforms as transforms


transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5,0.5,0.5),(0.5,0.5,0.5))])

trainset = torchvision.datasets.CIFAR10(root='./data', train =True, download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=4, shuffle=True, num_workers=2)  #for putting our data into batches
testset = torchvision.datasets.CIFAR10(root="./data", train =False, download=True, transform=transform)
test_loader = torch.utils.data.DataLoader(testset, batch_size=4, shuffle=True, num_workers=2)

#define the classes
classes = ('plane','car','bird','cat','deer', 'dog', 'frog', 'horse', 'ship', 'truck')

class CIFARmodel(nn.Module):
    
    def __init__(self):
        super(CIFARmodel,self).__init__()
        
        #input is 32X32 and the padding is 2 for same padding
        self.conv1 = nn.Conv2d(in_channels=3, out_channels=6, kernel_size=3, padding=2,stride=2)
        #feature map size is 3X3 by pooling
        self.pool1 = nn.MaxPool2d(kernel_size=2, stride = 2)
        self.conv2 = nn.Conv2d(6, 16, 5, padding=2)
        self.pool2 = nn.MaxPool2d(kernel_size=2, stride = 2)
        #feature map size is 8X8
        self.fc1 = nn.Linear(16*8*8, 128)
        self.fc2 = nn.Linear(128,64)
        self.fc3 = nn.Linear(64,10)
        
    def forward(self,x):
        x = self.pool1(F.relu(self.conv1(x)))
        x = self.pool2(F.relu(self.conv2(x)))
        print(x.shape)
        x = x.view(-1,16*8*8)    #flatten
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return F.log_softmax(x)
    
model = CIFARmodel()
model
        
for p in model.parameters(): 
    print(p.size())
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr = 0.001, momentum=0.9)


train_loss=[]
train_acc=[]
for epochs in range(3):
    running_loss = 0.0
    for data in enumerate(trainloader):
        #gets the input, data is a list [inputs, labels]
        inputs, labels = data
        #initialze the value of gradient parameter with zero
        optimizer.zero_grad()
        
        #forward propogation -> backward propogation -> optimization
        outputs = model(inputs)
        loss = criterion(outputs, labels)
        train_loss.append(loss.data[0])
        loss.backward()
        
        optimizer.step()
        
        #print the statistics
        prediction = outputs.data.max(1)[1]   #first column has the actuaal prob
        accuracy = prediction.eq(labels.data).sum()/4 *100
        train_acc.append(accuracy)
        running_loss+=loss.item()
        if i%2000 ==0:       #printing after 2000 mini batches
            print('Train Step: {}\tLoss: {:.3f}\tAccuracy: {:.3f}'.format(i, loss.data[0], accuracy))
        
        running_loss = 0.0
print('FINISHED TRAINING')

torchvision.transforms.functional.pad expects a PIL.Image, while you are passing a tensor to this function.
If you want to pad the tensor, you could use torch.nn.functional.pad instead.

1 Like

Most likely input in an int instead of a tensor.
Could you check the type before passing it to the model?

I would recommend to go through some tutorials to see some examples of using PyTorch.
Specific questions can be answered in this forum. :slight_smile:

i used this code to check the data type

dataiter = iter(trainloader)
images, label = dataiter.next()
print(images)

The output shows tensor

tensor([[[[-0.1216, -0.1216, -0.1216,  ..., -0.2235, -0.2000, -0.1059],
          [-0.1137, -0.0745, -0.0510,  ..., -0.2549, -0.2627, -0.0980],
          [-0.0902, -0.0902, -0.0588,  ..., -0.2706, -0.3020, -0.1216],
          ...,
          [ 0.1765,  0.4275,  0.3020,  ..., -0.2627, -0.3647, -0.3098],
          [ 0.2000,  0.3412,  0.1608,  ..., -0.2706, -0.2863, -0.1843],
          [ 0.1059,  0.1686,  0.0824,  ...,  0.0353,  0.0275,  0.1059]],

         [[-0.3412, -0.3412, -0.3333,  ..., -0.4039, -0.3804, -0.3020],
          [-0.3020, -0.2078, -0.1608,  ..., -0.3020, -0.3098, -0.2157],
          [-0.2471, -0.1451, -0.0980,  ..., -0.2392, -0.2471, -0.1529],
          ...,