RuntimeError: tensor.sub_(mean[:, None, None]).div_(std[:, None, None]) RuntimeError: The size of tensor a (4) must match the size of tensor b (3) at non-singleton dimension 0 In [ ]: 1] in transfer learning

I am using the transfer learning tutorial (https://pytorch.org/tutorials/beginner/transfer_learning_tutorial.html) to classify my own image, classify the image include woman(1) or not (0).
The image is three channels, the size (length, width) is different each image. some are (520, 520), some are (120, 60), some are (1200, 300)…

banner_transforms_train = transforms.Compose(
    [
        transforms.Resize(256),
        transforms.RandomResizedCrop(224),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.4546, 0.406], [0.229, 0.224, 0.225])
    ]
)
banner_transforms_test = transforms.Compose(
    [   transforms.Resize(256),
        transforms.CenterCrop(224),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.4546, 0.406], [0.229, 0.224, 0.225])
    ]
)
from torch.utils.data import Dataset, DataLoader
from torchvision import transforms
from PIL import Image
import torch
import csv
import os


class bannerDataset(Dataset):
    def __init__(self, dataPath, label_name, df_labels,  transform=None):
        self.dataPath=dataPath     #the path to the data directory
        self.transform = transform  #a transform object
        #builds a list of (name,label) tuples          
        self.labels = list(df_labels[label_name])
        self.image_id = list(df_labels['image_id'])
        
    def __len__(self):
        return len(self.labels)
    
    def __getitem__(self, idx):
        imageName, imageLabel=self.image_id[idx], self.labels[idx]
        imagePath =  os.path.join(self.dataPath, imageName)
        image = Image.open(open(imagePath, 'rb'))
        
        return  image, imageLabel

class DatasetTransformer(torch.utils.data.Dataset):

    def __init__(self, base_dataset, transform):
        self.base_dataset = base_dataset
        self.transform = transform

    def __getitem__(self, index):
        img, target = self.base_dataset[index]
        return self.transform(img), target

    def __len__(self):
        return len(self.base_dataset)

bannerdata= bannerDataset(path2, 'woman', df_tag_repeat_del_woman)
import numpy as np

batch_size = 16
validation_split = .1
shuffle_dataset = True
random_seed= 42
dataset_size = len(bannerdata)
valid_ratio = 0.1
# Split it into training and validation sets
nb_train = int((1.0 - valid_ratio) * len(bannerdata))
nb_valid =  int(valid_ratio * len(bannerdata))
train_dataset, valid_dataset = torch.utils.data.dataset.random_split(bannerdata, [nb_train, nb_valid])
train_dataset = DatasetTransformer(train_dataset, banner_transforms_train)
valid_dataset = DatasetTransformer(valid_dataset, banner_transforms_test)

num_threads = 4
batch_size  = 32
train_loader = torch.utils.data.DataLoader(dataset=train_dataset,
                                          batch_size=batch_size,
                                          shuffle=True,                # <-- this reshuffles the data at every epoch
                                          num_workers=num_threads)

valid_loader = torch.utils.data.DataLoader(dataset=valid_dataset,
                                          batch_size=batch_size, 
                                          shuffle=False,
                                          num_workers=num_threads)

from torchvision import datasets, models, transforms
import torch.nn as nn
import torch.optim as optim
from torch.optim import lr_scheduler
import copy

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
model_ft = models.resnet18(pretrained=True)
num_ftrs = model_ft.fc.in_features
model_ft = model_ft.to(device)
criterion = nn.CrossEntropyLoss()
# Observe that all parameters are being optimized
optimizer_ft = optim.SGD(model_ft.parameters(), lr=0.001, momentum=0.9)
# Decay LR by a factor of 0.1 every 7 epochs
exp_lr_scheduler = lr_scheduler.StepLR(optimizer_ft, step_size=7, gamma=0.1)

def train_model(dataloader, model, criterion, optimizer, scheduler, num_epochs=25):
    since = time.time()

    best_model_wts = copy.deepcopy(model.state_dict())
    best_acc = 0.0

    for epoch in range(num_epochs):
        print('Epoch {}/{}'.format(epoch, num_epochs - 1))
        print('-' * 10)

        # Each epoch has a training and validation phase
        for phase in ['train', 'val']:
            if phase == 'train':
                scheduler.step()
                model.train()  # Set model to training mode
            else:
                model.eval()   # Set model to evaluate mode

            running_loss = 0.0
            running_corrects = 0
            print('here0')
            for inputs, labels in dataloader[phase]:
                print('phase:', phase)
                inputs = inputs.to(device)
                labels = labels.to(device)
#                 print('labels:', labels)

                # zero the parameter gradients
                optimizer.zero_grad()

                # forward
                # track history if only in train
                with torch.set_grad_enabled(phase == 'train'):
                    outputs = model(inputs)
                    _, preds = torch.max(outputs, 1)
                    loss = criterion(outputs, labels)

                    # backward + optimize only if in training phase
                    if phase == 'train':
                        loss.backward()
                        optimizer.step()

                # statistics
                running_loss += loss.item() * inputs.size(0)
                running_corrects += torch.sum(preds == labels.data)

            epoch_loss = running_loss / dataset_sizes[phase]
            epoch_acc = running_corrects.double() / dataset_sizes[phase]

            print('{} Loss: {:.4f} Acc: {:.4f}'.format(
                phase, epoch_loss, epoch_acc))

            # deep copy the model
            if phase == 'val' and epoch_acc > best_acc:
                best_acc = epoch_acc
                best_model_wts = copy.deepcopy(model.state_dict())

        print()

    time_elapsed = time.time() - since
    print('Training complete in {:.0f}m {:.0f}s'.format(
        time_elapsed // 60, time_elapsed % 60))
    print('Best val Acc: {:4f}'.format(best_acc))

    # load best model weights
    model.load_state_dict(best_model_wts)
    return model    

model_ft = train_model(dataloader, model_ft, criterion, optimizer_ft, exp_lr_scheduler,
                       num_epochs=25)

But there is error:

RuntimeError                              Traceback (most recent call last)
<ipython-input-102-9f7b9d2bfee0> in <module>
      1 model_ft = train_model(dataloader, model_ft, criterion, optimizer_ft, exp_lr_scheduler,
----> 2                        num_epochs=25)

<ipython-input-101-27371a5d97da> in train_model(dataloader, model, criterion, optimizer, scheduler, num_epochs)
     20             running_corrects = 0
     21             print('here0')
---> 22             for inputs, labels in dataloader[phase]:
     23                 print('phase:', phase)
     24                 inputs = inputs.to(device)

~/.local/lib/python3.6/site-packages/torch/utils/data/dataloader.py in __next__(self)
    566         if self.rcvd_idx in self.reorder_dict:
    567             batch = self.reorder_dict.pop(self.rcvd_idx)
--> 568             return self._process_next_batch(batch)
    569 
    570         if self.batches_outstanding == 0:

~/.local/lib/python3.6/site-packages/torch/utils/data/dataloader.py in _process_next_batch(self, batch)
    606                 raise Exception("KeyError:" + batch.exc_msg)
    607             else:
--> 608                 raise batch.exc_type(batch.exc_msg)
    609         return batch
    610 


RuntimeError: Traceback (most recent call last):
  File "/home/xx/.local/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 99, in _worker_loop
    samples = collate_fn([dataset[i] for i in batch_indices])
  File "/home/xx/.local/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 99, in <listcomp>
    samples = collate_fn([dataset[i] for i in batch_indices])
  File "<ipython-input-70-04e3287e2c1c>", line 9, in __getitem__
    return self.transform(img), target
  File "/home/xx/.local/lib/python3.6/site-packages/torchvision/transforms/transforms.py", line 60, in __call__
    img = t(img)
  File "/home/xx/.local/lib/python3.6/site-packages/torchvision/transforms/transforms.py", line 163, in __call__
    return F.normalize(tensor, self.mean, self.std, self.inplace)
  File "/home/xx/.local/lib/python3.6/site-packages/torchvision/transforms/functional.py", line 208, in normalize
    tensor.sub_(mean[:, None, None]).div_(std[:, None, None])
RuntimeError: The size of tensor a (4) must match the size of tensor b (3) at non-singleton dimension 0

But the way, the df_tag_repeat_del_woman is data frame like the following:

id         label
1.jpg     1
2.jpg      0
........

I think you might have RGBA images in your dataset! The error in your title is not the one you get in the script, BTW. This is related to the number of channels of the images (in that case 4) and the number of channels in your transforms (3).

When you call image = Image.open(open(imagePath, 'rb')) (BTW you should replace that by image = Image.open(imagePath) which works just as well), add .convert('RGB').

Be aware that this will transform the background (which is likely 0 by default, since transparency will hide it) in full black. This is not ideal and you should verify your images to determine what you should do with transparent ones. If the background is supposed to be white, you can use numpy to modify the images so that you replace transparent pixels with white ones. I also suggest that you do that offline before training instead of during training, as it will take unnecessary processing time…

1 Like

Hi thanks for answer. Before I used pytorch, I used the keras is ok for my code.

I used opencv to read and resize these data(same data) ink keras is ok for me.

 for filepath in train_batch:
#                         print('filepath:', filepath)
                        img = cv2.imread(filepath)
                        img = cv2.resize(img, img_size)
                        img = augment(img, np.random.randint(6))
                        x_batch.append(img)
                    x_batch = np.array(x_batch, np.float32) / 255.
                    y_batch = y_train[start:end]

I don’t know how to solve the issue in pytorch. I am a new one to pytorch

This is not related to PyTorch in itself, but to PIL and images in general.

Look here for a solution using PIL.

Thanks I will check it and report it. But how about using the opencv to read data for pytorch like in keras, and it is simple.

I have solved that using opencv to read and transform to image by imagetoarray